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Interrelationships Among Measures of Job Satisfaction and 
General Satisfaction ' 


Arthur H. Brayfield, Richard V. Wells * 


Kansas State College 


and Marvin W. Strate 


City of Kansas City, Missouri 


The assessment of employee attitudes, both 
for practical and theoretical ends, currently is 
an active area in industrial psychology (2, 
10). Methodological problems, substantive 
findings, and theoretical implications have 
been reviewed recently (3, 11). In survey- 
ing the literature, however, it is surprising to 
find that one question of practical and theo- 
retical importance has been neglected. To 
what extent, if any, is attitude toward the job 
(job satisfaction) related to attitude toward 
life in general (general satisfaction) ? 

If the two sets of attitudes are closely re- 
lated it would be of considerable importance 
to determine, if possible, the nature of the 
functional relationship. If job satisfaction is 
a function of a general attitude toward life 
then an employer may be stymied in an at- 
tempt to “improve” job satisfaction among 
his employees since it is unlikely that he could 
influence general satisfaction as it arises apart 
from the job or is inherent in the individual 
as a response tendency. 

Hoppock (5) and Blum (2) have noted the 
significance of this relationship and, further, 
have suggested that job satisfaction may, in 
part, be a function of general satisfaction or 
attitude toward life. Roe recently has con- 
cluded that “it is in point of fact impossible 
to separate occupational adjustment from gen- 

1 The writers are indebted to Professor Walter H. 
Crockett who made helpful suggestions in the initial 
stages of the study. 


“Now with the Kaiser Aluminum and Chemical 
Corporation, Ravenswood, West Virginia. 
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eral life adjustment, or occupational satisfac- 
tion from satisfaction with life. One is a 
measure of the other, neither is prior to nor 
independent of the other, both are indications 
of the person in the world” (8, p. 284-285). 

Actually, there is very little empirical evi- 
dence assessing even the magnitude of the re- 
lationship; apparently there is no evidence 
regarding the functional relationship. 

The earliest reported study of this relation- 
ship was an investigation by Wesley (14) of 
the job satisfaction and general morale of for- 
mer University of Minnesota students. Using 
the Hoppock Job Satisfaction Blank and the 
Rundquist-Sletto Morale Scale as measures of 
what we would term job satisfaction and gen- 
eral satisfaction, respectively, he found a cor- 
relation of .31 among 211 employed males 
who were followed up approximately twelve 
years after they had enrolled at the Univer- 
sity. For another group of employed males, 
former college students studied eight years 
after college entrance, and a group of em- 
ployed females, former college students fol- 
lowed up approximately eight to twelve years 
after matriculation, he found that attitude to- 
ward the job was significantly and positively 
related to attitude toward life in general when 
he employed a chi-square analysis of the ma- 
terials. Miller (6) confirmed these findings 
for the same samples in a subsequent but 
slightly different analysis. 

More recently, Weitz (13) has published a 
preliminary account of his use of a 44-item 
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Test of General Satisfaction with a sample of 
life insurance agents. Typical items are: food 
prices, local speed limits, advertising methods, 
your first name, telephone service. The agents 
also checked a number of items concerning 
their jobs as sources of satisfaction or dissatis- 
faction. The number of general dissatisfac- 
tions checked correlated .39 with the number 
of job dissatisfactions. 

A somewhat related study by Watson (12) 
examined the relationship between scores on 
the Hall Seale for Measuring Occupational 
Morale and A Life Satisfaction Index derived 
from items in the Bernreuter Personality In- 
ventory and the Strong Vocational Interest 
Test among 53% unemployed males during the 
depression. He found a correlation of .25. 

This scanty evidence is suggestive of a 
small but positive relationship between atti- 
tude toward the job and attitude toward life 
in general. 

A persisting methodological problem is in- 
volved here. To what extent are the meas- 
ures within each attitude domain themselves 
related? For example, is Weitz’ General Satis- 
faction Test highly correlated with the Rund- 
quist-Sletto Morale Scale? The same question 


holds for different job satisfaction measures. 
Three studies have attempted to determine 
the relationship between scales purporting to 


measure attitude toward the job. Hoppock 
(5) found that his general, over-all measure 
of job satisfaction correlated .67 with another 
satisfaction index based on the sum of the re- 
sponses to 98 specific items dealing with the 
job and conditions of work in a sample of 100 
teachers. Ash (1) correlated the total scores 
on the Science Research Associates Employee 
Inventory with the scores on the Brayfield- 
Rothe Job Satisfaction Index and obtained a 
coefficient of .476 for a group of steel con- 
tainer fabricating plant employees. The total 
score on the Employee Inventory is a sum- 
mated item score from 76 specific items con- 
cerning the job. The Brayfield-Rothe Index 
uses items of a general nature to obtain a 
global or over-all attitude toward the job. 
Finally, Brayfield and Rothe (4) found a 
correlation of .92 between their Index and 
the Hoppock Job Satisfaction Blank, another 
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global measure, among a group of employed 
males and females. 

The available evidence, then, seems to in- 
dicate that an index of job satisfaction which 
is a summation of responses to specific items 
regarding the job will not necessarily give re- 
sults directly comparable to those obtained 
from an index developed to provide a global, 
over-all appraisal of job satisfaction by means 
of rather general items. 

No studies relating different measures of 
general satisfaction have been located al- 
though some of the investigations of relation- 
ships among personality measures may con- 
tain relevant data. 

The present investigation was designed to 
assess the magnitude of the relationship be- 
tween attitude toward the job and attitude 
toward life in general and to compare two dif- 
ferent scales which, by inspection, might be 
considered to be measures of each of these 
attitudes. 


Procedure 


The Brayfield-Rothe Job Satisfaction Index (4) 
and the SRA Employee Inventory (15) were chosen 
to represent, respectively, an over-all measure of job 
satisfaction using general items and a measure which 
is a summation of responses to specific job situation 
or content items. 

The two measures used to index general satisfac- 
tion or attitude toward life were the Rundquist- 
Sletto Morale Scale (9) and the Weitz Test of Gen- 
eral Satisfaction (13). In a sense, the former corre- 
sponds to the Brayfield-Rothe Job Satisfaction Index 
with respect to measurement approach while the lat- 
ter bears some similarity to the SRA Inventory in 
its use of specific items. Four political items in the 
original Weitz Test were replaced with food items. 
Also, Weitz scored his test by counting up the num- 
ber of general satisfactions. In this study, the total 
score was the number of satisfaction items checked 
less the number of dissatisfaction items plus 50 to 
avoid negative scores 

The subjects were 41 male and 52 female civil 
service employees in a large midwestern city who 
were employed in three departments of the city gov- 
ernment—License Bureau, Assessor’s Office, and Of- 
fice Services. All were in office-type occupations. 
The men predominantly were in higher level classifi- 
cations which entail some independent judgment and 
carry the higher salaries. The females occupied more 
routine clerical positions. The men typically were 
in their forties and the women in their thirties. 

The materials were administered by two of the 
investigators to four groups of approximately 25 each 
as part of a pilot employee attitude survey under the 
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Table 1 
Reliabilities of Three Satisfaction Measures for 
City Government Employees 


Male 
(N = 41) 
Measure r* 


Female 


(N = §2) 


Rundquist-Sletto 89 
Weitz 70 
Brayfield-Rothe 20 


* Corrected by Spearman- Brown formula. 


sponsorship of the city administration. The ano- 
nymity of the responses was stressed although identi- 
fication by broad job classification and department 
was obtained. The fixed order of presentation was 
SRA Inventory, Rundquist-Sletto Scale, Weitz Test, 
and Brayfield-Rothe Index. 


Results 


As a first step, a check was made of the re- 
liability of three of the measures. Because of 
the large number of items, as well as the 
amount of existing information, no check was 
made on the SRA Inventory and it was as- 
sumed that a reliability coefficient of .89 re- 
ported in the Manual (15) for the total score 
was representative. The corrected split-half 
reliabilities for the two samples are given in 
Table 1. These reliabilities were considered 
to be satisfactory for the purposes of this 
investigation although the Weitz test coeffi- 
cients were somewhat low. The latter may 
be compared to the reliability coefficient of 
.75 reported by Weitz for 168 life insurance 
salesmen (13). 

The uncorrected intercorrelations among the 
four measures are reported in Table 2. The 
methodological findings are most appropri- 
ately discussed first. 

Among the males, the two measures of job 
satisfaction, Brayfield-Rothe and SRA, are 
significantly related although the correlation 
of .40 is not high. This coefficient may be 
compared directly with Ash’s (1) finding of 
476 for a group 82% of whom were males. 
Both of these coefficients are somewhat lower 
than Hoppock’s (5) correlation of .67 be- 
tween global, general item and summated, 
specific item score approaches. Hoppock’s 
coefficient probably is spuriously high since 
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the scoring system for the summated item 
method was based on an item analysis where 
the global job satisfaction scale provided the 
criterion for differentiating the extreme groups. 

Among the females, although the correla- 
tion of .20 between the two different methods 
of measuring job satisfaction is positive it is 
not statistically significant. 

These findings lend support to the generali- 
zation that these two somewhat different ap- 
proaches to the measurement of job satisfac- 
tion do not give interchangeable results. In 
this connection, it should be recalled that 
Brayfield and Rothe (4) reported a_ very 
high relationship between two job satisfaction 
measures which did use the same global ap- 
proach. 

The relationship between the two measures 
of general satisfaction, Rundquist-Sletto and 
Weitz, is positive. and significant in both 
groups although it is somewhat higher for the 
males. The correlations of .57 and .43 for 
the males and females, respectively, indicate 
that there is some communality among the 
two measures. Even so, it should be stressed 
that there is considerable independence among 
the two methods of measuring general satis- 
faction. 

The major question raised in this investiga- 
tion was the extent to which measures of atti- 
tude toward the job and attitude toward life 
were related. 

To maximize the information available, each 
measure of job satisfaction was correlated 


Table 2 
Intercorrelations Among Four Measures of Satisfaction 


for City Government Employees 


Female 
«= 52) 


Male 
V = 41) (N 


Measures r 


Brayfield-Rothe vs. SRA 20 
Rundquist-Sletto vs. Weitz 


Rundquist-Sletto vs. SRA 04 
Weitz vs. SRA 12 


Rundquist-Sletto vs 
Brayfield-Rothe 
Weitz vs. Brayfield-Rothe 


* Significant at 5 per cent level, 
** Significant at 1 per cent level 
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Table 3 


Means and Standard Deviations for Four Measures of 
Satisfaction for City Government Employees 


Female 
(N = 52) 


Mean SD 


Male 
(N = 41) 


Measure Mean SD 


7.98 
9.89 
8.62 
13.05 


81.47 
67.57 
63.81 
52.96 


Rundquist-Sletto 
Weitz 
Brayfield-Rothe 
SRA 


81.10 
6A.23 
0.54 
42.84 


10.15 
10.48 
14.98 
16.5% 


with each measure of general satisfaction. 
This analysis also was suggested by the find- 
ing that the intercorrelations among each of 
the sets of measures were not particularly 
high especially for the females. 

The results are striking in at least one re- 
spect. There are no statistically significant 
relationships between job satisfaction and 
general satisfaction among the female em- 
ployees. Yet the measures of these same 


variables are significantly correlated in the 
male employee group. 

Among the males, both measures of general 
satisfaction, Rundquist-Sletto and Weitz, bear 


a substantial relationship to one measure of 
job satisfaction, the SRA Employee Inven- 
tory total score, being .67 and .68, respec- 
tively. This finding would lead to the tenta- 
tive conclusion that job satisfaction and 
general satisfaction are fairly highly related 
among males in comparable work situations. 

However, this conclusion becomes some- 
what more equivocal when the correlations 
with the Brayfield-Rothe ‘measure are in- 
spected. The correlation between the Bray- 
field-Rothe and the Rundquist-Sletto Morale 
scale is only .49. If these measures had been 
the only ones used to test the magnitude of 
the relationship it might have been described 
as being somewhat lower. Actually, the Rund- 
quist-Sletto vs. SRA Employee Inventory and 
the Rundquist-Sletto vs. Brayfield-Rothe cor- 
relations do not differ significantly from each 
other. However, the correlation of .32 be- 
tween the Weitz and the Brayfield-Rothe 
actually is significantly different from the 
Weitz test’s correlation of .68 with the SRA 
Inventory. If the r of .32 between Weitz and 


Brayfield-Rothe had been used alone to index 
the relationship it would generate the conclu- 
sion that the relationship is a very modest 
one although positive. 

It is apparent that generalizations regard- 
ing the relationship between job and general 
satisfaction must be qualified by considera- 
tion of the type of measures used. 

The situation with respect to the females is 
puzzling. Why are there no significant rela- 
tionships among these variables for the fe- 
male subjects? 

In part, this discrepancy may be attribut- 
able to the restriction in range of scores for 
the females on the four measures. The stand- 
ard deviations reported in Table 3 indicate 
that the women are somewhat more homoge- 
neous than the males on these measures. 

One plausible hypothesis is that work is a 
less important factor in the lives of these 
women than it is for the men. Some pe- 
ripheral evidence obtained in the course of 
the study bears on this. For example, the 
wemen were somewhat more likely than were 
the men to say that their job did not give 
them a chance to work off their emotions, 
that it was not exciting, that it was nothing 
more than a way of making a living, and that 
they did not have to work for a living. Also, 
they tended to be somewhat more certain that 
they were doing as well in their jobs as their 
family expected, and that their family would 
not like for them to change their jobs; they 
were more satisfied with the prestige which 
their jobs gave them with their friends. Sixty- 
two per cent of the women reported that they 
had no dependents as compared to only 10% 
of the men; 63% of the women were married 
as compared to 88% of the men. Apparently 
the job was a matter of supplemental income 
for a considerable portion of the women. 

Possibly job level also is involved here. 
The men’s jobs may have been perceived as 
more important to their occupants because 
they actually were higher in the job classifi- 
cation hierarchy. Thus, because they are 
higher level jobs they might mean “more’’ or 
something “different” to the men. For ex- 
ample, Morse and Weiss found among a sam- 
ple of employed men that the middle-class oc- 
cupations “are much more important to their 
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occupants than are the working-class jobs to 
their occupants” (7, p. 196). Wesley’s (14) 
finding of a positive relationship between 
job and general satisfaction among employed 
women may be relevant; it probably is sig- 
nificant that among his women subjects more 
than 50% were employed in professional, 
semiprofessional, and managerial positions. 

These “straws in the wind” seem to sup- 
port the notion that the job plays a more sig- 
nificant role in the lives of these men than it 
does for the women. The man’s job is more 
closely tied up with the satisfaction of many 
needs and thus becomes a pervasive influence. 
The observed sex difference in this study in 
the relationship between job and general satis- 
faction may be a clue to the functional rela- 
tionships involved. That is to say, if posi- 
tive relationships between job and general 
satisfaction had been found among both males 
and females then it would be impossible to 
make any inference regarding the functional 
relationships. However, in the absence of 
significant relationship within the female 
group, it seems reasonable to argue that gen- 
eral satisfaction is not determinative of job 
satisfaction. Indeed, it may be suggested 
that, when the job is perceived as important 
in the life scheme as may be the case for the 
males here, general satisfaction becomes a 
function, in part at least, of job satisfaction. 

Again, the Morse and Weiss study is sug- 
gestive; they concluded from their investiga- 
tion of the meaning of work to their national 
sample of employed men that “the results in- 
dicate that for most men working does not 
simply function as a means of earning a liveli- 
hood . . . most men find the producing role 
important for maintaining their sense of well- 
being” (7, p. 198). 


Summary and Conclusions 


This study was conducted to discover the 
relationship between attitude toward the job 
and attitude toward life in general using two 
somewhat different measurement approaches 
for each variable. The subjects were 41 male 
and 52 female city government office em- 
ployees. 
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The two different approaches to the meas- 
urement of job satisfaction bore a moderately 
positive relationship to each other among the 
males but were unrelated in the female sam- 
ple. The two measures of general satisfac- 
tion were positively and significantly related 
in both samples with the correlation for the 
males being somewhat higher. 

Job satisfaction and general satisfaction were 
positively and significantly related among the 
males; no significant relationships were ob- 
tained among the females. 


Received June 4, 1956. 
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Applying the Weighted Application Blank Technique to a 
Variety of Office Jobs * 


Wayne K. Kirchner and Marvin D. Dunnette 


Minnesota Mining and Manufacturing Company 


Psychological literature includes many ex- 
amples of the use of the weighted application 
blank in industry. Kreidt and Gadel (2) 
found, for example, that the best predictor of 
clerical turnover was biographical data scored 
on a weighted basis. Mosel and Wade (3) 
used a weighted application blank to help re- 
duce turnover among department store sales 
clerks. Dunnette and Maetzold (1) also used 
this technique with seasonal workers in a can- 
nery. In addition, there are many others, 
with the successful use of personal history 
data in life insurance sales selection being a 
prime example. 

The basic finding of a majority of these 
studies has been that personal history data 
will differentiate between groups chosen on 
some criterion, usually turnover. These data 
can be weighted successfully for predictive 
use. For the most part, however, studies have 
been concerned with persons working on one 
specific job rather than a group of different 
jobs. The aim of this report is to present re- 
sults of a weighted application blank study 
based on female office employees who held 
and performed a variety of jobs including 
clerical, stenographic, secretarial, and per- 
sonal contact. 


Method 


Application blanks for all female office employees 
in the home office and plant, hired for the first time 
by the company in 1954, were reviewed as a starting 
point for this study. These application blanks con- 
sisted of four-page folders, standardized and adopted 
for use at the end of 1953. Length of service was 
chosen as the criterion because of fairly high turn- 
over. The persons in the sample were grouped on 
this as follows: 


(a) 9 months and under (N = 33) 
(b) 10 to 18 months (N = 25) 
(c) 19 months and over (N = 105) 


1 The authors gratefully acknowledge the valuable 
aid of Jo Anne DeGidio and Elizabeth Andrew in re- 
viewing and recording the basic data. 


Blanks for long-term employees (19 months and 
over) were compared, variable by variable, with 
those of short-term (9 months and under) employees 
using the traditional approach of weighted applica- 
tion blank studies. A total of 40 variables were re- 
viewed. Of these, 15 were found to differentiate 
sharply between long- and short-term groups. Net 
weights of 0, 1, and 2 were assigned to responses for 
each of the 15 variables based on the tables of 
weights compiled by Stead and Shartle (4). 

The weights on each blank were added and a total 
score obtained for each blank. Possible range of 
scores for the 15 variables involved was, therefore, 
0-30. 

The same process was repeated on application 
blanks for a 33% sample of female office employees 
hired for the first time by the company in 1955. 
This sample was grouped by length of service as 
follows: 


(a) 9 months and under (N = 40) 
(b) 10-15 months (N = 20) 
(c) 16 months and over (N = 45) 


As is seen, the long-term group in the cross-validation 
sample had « somewhat shorter period of tenure than 
“long-termers” in the validation sample. This was 
done to insure a more equal grouping of long- and 
short-term employees. The same 15 variables identi 
fied as most significant in the validation sample were 
scored for each application blank and the scores 
summed to form a total. 


Results and Discussion 


A quick glance at Table 1 shows that long- 
and short-term employees were highly differ 
entiated in the original 1954 group on the 
basis of application blank scores. Long-ten- 


Table 1 
Quartiles and Means of Weighted Scores for Long-Term 
and Short-Term Office Employees Hired in 1954 


(Original Group) 


Median Q» Mean 


Group 


Short 10 13 
Long 18 21 


10.79 
17.59 
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Table 2 


Quartiles and Means of Weighted Scores for Long-Term 
and Short-Term Office Employees Hired in 1955 


(Cross-Validation Group) 


Median Qs, 


Group Q, Mean 


Short 10 12 15 
Long 14 18 21 


12.32 
17.69 


ure employees received much higher scores on 
the average than did short-term persons. The 
difference between the mean scores is highly 
significant statistically. In terms of overlap, 
only 12% of the short-term group equal or 
exceed the median score of the long-term 
group (4 out of 33 cases). 

This, in itself, is not conclusive because 
chance differences could be contributing to 
the differentiation. However, Table 2 pre- 
sents impressive evidence that the same vari- 
ables also differentiate effectively between 
long- and short-tenure female office employees 
in the 1955 group. Results for this cross- 
validation group, then, are highly similar to 
those of the original group. 

As is seen, the absolute difference between 
two means for the 1955 groups (5.37 points) 
is somewhat less than the difference in the 
1954 samples (6.80 points). This difference 
still is highly significant statistically; and, in 
terms of overlap, only 7.5°% of the short-term 
group equals or exceeds the median score of 
the long-term group (3 out of 40 cases). 
This is a difference of marked practical sig- 
nificance. The cross-validation samples actu- 
ally show less overlap than the validation 
samples. The reason for this, of course, is 
the smaller variance found in the 1955 short- 
term group as compared to the 1954 short- 
term group variance. Strangely enough, the 
distributions of scores for the long-term groups 
for both 1954 and 1955 are almost identical 
in nature. 

When length of service (in months) and 
weighted application blank score for each in- 
dividual are correlated for the 1955 group, a 
respectable product moment r of .42 is ob- 
tained. It seems obvious, then, that 15 per- 
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sonal history variables found on the ap- 
plication blank do differentiate consistently 
between those female office employees who 
leave the company fairly soon after being em- 
ployed and those who stay much longer. 

A composite picture of the long-term fe- 
male employee shows her to be young, single, 
from Minnesota, a graduate of high school or 
college, who does not have any college years 
left to finish, and without an auto or furni- 
ture. She has left her last job for advance- 
ment purposes, made a low salary on her last 
job, had few previous jobs, had commercial or 
business training in high school, is exception 
ally good at shorthand, and has little experi- 
ence at running office machines. She cannot 
start immediately on her new job, has lost 
some time from school or work in the past 
year, and has been employed on her last job 
for over two years. Her father’s occupation 
is probably agricultural, clerical, or sales. 

The practical value of these results is ap- 
parent. Fifteen variables are present on the 
application blank that do differentiate be- 
tween long-tenure female office employees and 
short-tenure ones. Scores now can be ob- 
tained from the application blank of any fe 
male office applicant that will predict with a 
good deal of accuracy whether or not she is a 
good risk in terms of tenure. 

The success of this study also suggests 
strongly that weighted application blank stud- 
ies need not be restricted to specific occupa- 
tions but can often be applied to broader oc- 
cupational groupings. It is highly probable 
that the weighted application blank, in the 
future, can be extended to cover many occu- 
pational areas hitherto untouched by such re- 
search. 


Summary 


1. Application blanks of female office em- 
ployees hired for the first time in 1954 were 
reviewed. Fifteen personal history variables 
were found to differentiate between long-term 
and short-term female office employees when 
weights were assigned to these variables. 

2. A similar study was made on application 
blanks of female office employees hired for the 


first time in 1955. Similar results were ob 
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tained for this 1955 cross-validation group. 
The same 15 variables tended to differentiate 
markedly between long- and short-term em- 
ployees. In terms of overlap, there was less 
overlap in total scores between long- and 
short-term groups in the cross-validation sam- 
ple than in the validation group. 

3. Weighted application blank studies can 
be extended to cover broad occupational 
groups rather than being limited to one spe- 
cific occupation. 


Received December 17, 1956. 
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The Attitudes Test in Human Relations (ATHURE) 


Aaron J. Spector ' 


Officer Education Research Laboratory, Maxwell Air Force Base, Montgomery, Alabama 


Evaluation of the effectiveness of the cur- 
riculum in human relations at the Air Univer- 
sity has required the development of an Atti- 
tudes Test in Human Relations (ATHURE). 
The forced-choice type of test was chosen be- 
cause of its presumed ability to reduce de- 
liberate falsification of responses which is one 
of the major sources of error in attitude meas- 
urement (6). 

Criterion 

The Officer Behavior Description (OBD), 
reported earlier in this journal (7), provided 
an objective description of officers’ human re- 
lations behavior ‘‘on the job.” The 494 Air 
Force majors and lieutenant colonels attend- 
ing the Air Command and Staff School, Class 
of 1954, served as subjects for the OBD. 
Officers whose scores were in the upper and 
lower 27% on the OBD were selected as 
criterion groups for development of the 


ATHURE. 


Forced-Choice Procedures 


Methodologically, the Discrimination Index 
has found general acceptance in forced-choice 
procedures despite the fact that a variety of 
methods are used to obtain the measure. 
However, the Preference Index has been the 
source of considerable variations and con- 
troversy (1, 5,8). Although it is not within 
the province of this paper to review all meas- 
ures of attractiveness which have been sug- 
gested and used, one will be singled out for 
discussion—the Favorability Index—because 
its weaknesses are found in many measures 
of this type. 


Item Attractiveness 


Harding and Long (3) examined the rela- 
tionships between Discrimination and Prefer- 
ence Indices, and concluded that Discrimina- 
tion Indices of suitable magnitude could only 
be obtained from items which have average 


1 Now located at the U. S. Naval Personnel Re- 
search Field Activity, Washington, D. C. 


Preference Indices, as is evident in Table 1.* 
Highland and Berkshire (4) also observed 
this and used a Favorableness Index instead 
of the Preference. The procedures when 
measuring Favorableness or similar indices, 
e.g., Job Importance, require two groups of 
subjects and different instructions, one for 
the Discrimination and one for the Favor- 
ableness Index. The critical shortcoming of 
the latter type of index lies in the fact that 
the instructions create a testing climate en- 
tirely different from the one in which the test 
is ultimately administered. It is an unsatis- 
factory index because it is measured in a con- 
text containing no restraints; the subjects are 
not faced with any consequences of their re- 
sponses. In a realistic setting there are con- 
straints—either the subject will be punished 
for a low score if he does not answer cor- 
rectly, or, if he is a rater, the person he is 
rating may be punished for a low score. 

Theoretically, an attractiveness index en- 
ables the test constructor to group items so 
that it appears to the respondent as though 
selection of one item would be just as good 
as selection of any other item in the group. 
The balance of negative and positive forces 
affecting selection should be comparable for 
all items. The situations are not comparable 
when in one case the respondent feels that he 
will be graded upon his test performance, 
while in the other case he feels like some- 
thing of an expert who is called upon to 
render a judgment. 

Since instructions to “indicate the favor- 
ableness of items’ cannot produce the pres- 
sures of a realistic setting, the responses ob- 
tained thereby cannot validly indicate the 
items’ attractiveness in the real test. A con- 
trary opinion is held by Lanman and Rem- 
mers (5) because “this (Favorableness) index 
permits the assembly of items in the appro- 
priate manner, which we agree is the over- 
riding consideration.”’ 


* This table is an elaboration upon the one pre- 
sented in (3) 
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Table 1 
Hypothetical Criterion Group Data and the Preference 


and Discrimination Indices Which May 
Be Derived From Them 


Criterion Group Indices 


Discrimi Prefer 

Good Poor nation ence 
Hi Hi Lo Hi 
Hi Av LoAv HiAv 
Hi Lo Hi Av 
% Ay Hi LoAy HiAv 
6 Av Av Lo Av 
Ay Lo LoAy LoAv 
Lo Hi Hi Av 
Lo \y LoAyv LoAv 
Lo Lo Lo Lo 


On other grounds, Berkshire (2) has also 
rejected the Favorableness Index. 

The shortcomings of the dual testing ap- 
proach, as used to obtain such measures as 
Favorableness, imply that the data for the 
necessary indices must be obtained in one 
testing period. However, examination of 
Table 1 reveals that this method allows one 
to derive the Discrimination Index directly 
from the Preference Index, and similarly, the 
latter is a direct function of the data in the 
first two columns. Accordingly, one could 
select items by inspection of data in the first 
two columns, thereby eliminating the sta- 
tistical computation of the two indices. If 
the actual item frequencies were shown in the 
columns, items on which the difference in 
frequencies between criterion groups attained 
a given magnitude would be considered the 
differentiators. Distractors would be chosen 
which have response frequencies similar to 
those of the poor group’s responses to the 
differentiators. However, the two types of 
items would not be equated in terms of Pref- 
erence or Attractiveness. An illustration of a 
tetrad taken from the ATHURE, which used 
this approach, is the following: 


Frequency of Choices 


Item Good Group Poor Group 
a 25 12 
b 25 12 
e, 12 12 
d 11 13 
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where the frequencies are in terms of “dis- 
agree” responses made on the items before 
they were assembled into tetrads; the good 
group chose items a and 6 more frequently 
than did the poor group. In this tetrad one 
would expect the poor group to distribute its 
choices equally among the four items, while 
the good group would be expected to over- 
select the first two items. 


ATHURE 
Attitudinal Items 


Many of the items on this test were adapted 
from questionnaires previously developed for 
Air Force research; some were from the Uni- 
versity of California’s F test, and some were 
submitted by the staff which teaches human 
relations at the Air University. 

A pool of 187 items was accumulated and 
edited; the 161 items which remained after 
editing were randomly ordered and prepared 
in booklet form. Respondents indicated their 
attitudes on a four-place scale, directly be- 
low each item, which ranged from “completely 
disagree” to “completely agree.” 


The First Administration 


As indicated earlier, the subjects for this 
test were field grade officers attending the 
Air Command and Staff School. The test 
was administered to the entire student body 
on the third day of class. On previous days 
they had taken other tests, such as the ACE, 
and special Air University tests. Conse- 
quently, they perceived this test as another 
one in the regular program at Air University. 

Only the 282 Air Force officers on whom 
OBD’s were completed by three raters (a sub- 
ordinate, a peer, and a superior) were uti- 
lized in developing the ATHURE. The up- 
per and lower 27°% of students on the OBD 
were selected for criterion groups. 


Item Selection 


The subjects responded to each of the 161 
items by checking one position on a four- 
place scale, and the data later were di- 
chotomized into “agree-disagree”’ responses to 
increase the similarity between the develop- 
mental testing and the actual test conditions. 
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Having equal Ns in the criterion groups 
allowed comparisons to be made using fre- 
quencies, rather than percentages, of responses 
to each item. A minimum difference in fre- 
quencies between groups was arbitrarily es- 
tablished in order to assure obtaining a suffi- 
cient number of items for constructing the 
ATHURE. In this manner, 48 items were 
selected in which the differences in “disagree” 
responses between the criterion groups ranged 
from 5 to 16. 


Cross-Validation 


Prior to item selection, described above, 
the subjects in each criterion group were rank 
ordered by criterion scores and every third 
subject was pulled out for cross-validation 
purposes. Items were selected on the basis 
of the remaining two-thirds; N equaled 51 in 
each group. The high and low cross-valida- 
tion groups were then scored with the se- 
lected items, and tetrachoric and biserial r’s 
(widespread distributions), .58 and .51, re- 
spectively, were obtained. Thus, it appeared 
reasonable to expect these items to satisfac- 
torily fulfill the requirements of differentia- 
tors on the forced-choice test. 


The Assembled Test 


Form I of the test was comprised of three 
parts, having 14 tetrads and 6 triads. An 
attempt was made to match differentiators 
which had similar patterns of criterion group 
responses, and next, to add distractors with 
matched response patterns. Multiple use of 
items was occasionally required in order to 
effect these matchings. 


The Second Administration 


The same students who had been used at 
the beginning of the course took Form I at 
the end of the term. During the nine months, 
however, the students had spent a consider- 
able amount of lecture and seminar time on 
the topic of human relations, and _ theoreti- 
cally, their attitudes changed as a result of 
this experience. 

If the course was successful in inducing the 
“correct”? attitudes there would be no differ- 
ences between the criterion groups at the end 
of the term since it was presumed that the 
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“good” criterion group already had the atti- 
tudes which the course intended to induce in 
the other students. One would not expect, 
therefore, to obtain a high correlation be- 
tween the students’ criterion scores and the 
posttest scores. On the other hand, a find- 
ing of no-difference might indicate that the 
test was completely unreliable or, perhaps, 
not valid. 


Procedures 


To test the instrument's ability to reduce deliber 
ate falsification a 2 * 3 design was set up, the vari- 
ables being Fake-No Fake, and High, Middle, and 
Low criterion groups. Each cell contained 
jects. 


38 sub- 


Method 


Since all the students could not be assembled at 
one time for purposes of administering the test, all 
the necessary papers and instructions were enclosed 
in an envelope and deposited in each officer’s mail 
box. One-half of the subjects received their test five 
days earlier than did the second half; all received 
instructions to return them within three days of re 
ceipt. The Fake groups were directed to try to get 
the highest score possible, even if it entailed choos- 
ing items which were not consonant with their own 
attitudes. The others were given traditional instruc- 
tions. It was presumed that the latter groups would 
falsify their responses to some extent, but not as 
much as would the groups who were given explicit 
instruction to do so 


Results 


The differences between groups appear to 
be negligible, as may be seen in Table 2. 
The obtained F = 2.15, with 5 and 222 df, 
approached significance, p << .10 > .05. Even 
if the 107 level were accepted as significant, 
the small differences group means 
would limit the test’s usability. As sug- 
gested earlier, results such as those obtained 
here might be attributable to the effective- 
ness of the training program, or to the test’s 
lack of validity or reliability 


between 


Accordingly, a 


Table 2 


Group Means on Testing for Fakability 


Hi V Mid N Lo N 
Fake 20.03 42 21.26 44 19.47 $8 
No Fake 20.62 44 19.52 % 19.61 $8 
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Table 3 


Analysis of Variance of Cross-Validation Data, Based 
on 1/3 Sample on Fakability Study 


Sums of 


Source Squares 


df 
164.24 5 
(90.43 
854.66 
1.69 
125.54 
37.01 
690.43 


Between 

Within 

Total 

Treatment (Fakability) 
Class (Hi, Mid, Lo) 
Interaction 


Within 66 


test-retest study of reliability was conducted 
with the next class, using a newly developed 
key and two additional forms. 


New Key 


One-third of the cases in each group were 
pulled out for purposes of cross-validating a 
new key (B) which was developed from the 
above data. The variances of the scores of 
the 14 sample were tested for homogeneity 
(y’ = 5.35 with 5 df), and then an analysis 
of variance was conducted (Table 3). A sig- 
nificant F (p .01) was obtained on class, 
suggesting that the differences between means 
of the High, Middle, and Low groups were 
attributable to their classification on the cri- 
terion instrument. Differences between the 
High-Middle and High-Low groups were sig- 
nificant (p < .05) by ¢ tests. The data also 
indicate that deliberate faking did not signifi- 
cantly increase the scores over those which 
were obtained by the No Fake groups, and 
that interaction between the major variables 
did not significantly affect the scores. 


Reliability Study 

Two additional forms of the ATHURE 
were developed and all forms were adminis- 
tered to 1000 officers in the Command and 
Staff School, Class of 1957, midway through 
the class and again four weeks later. No in- 
struction in human relations was given be- 
tween the testing periods. 


Results 


None of the reliability coefficients between 
two administrations of the same test, regard- 
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less of which key was used for scoring, was 
of greater magnitude than .43. The maxi- 
mum correlation (.50) was obtained between 
two different forms, I and III, although the 
test-retest correlation of each of the forms was 
35 and .43, respectively. Many of the r’s 
were of zero order. 

The results of the first and second adminis- 
tration of the tests were also separately ana- 
lyzed by the Kuder Richardson formula with 
similarly low coefficients. It appears that no 
form of the test is sufficiently reliable for op- 
erational use. Further indication of this is 
the size of the mean of the most reliable of 
the three forms—its mean is one point higher 
than the score one might expect by chance. 


Discussion 


The disappointingly low reliability is diffi- 
cult to reconcile with the fact that validity 
coefficients of the items prior to their being 
assembled into the forced-choice tetrads were 
.51 and .58 for biserial and tetrachoric r, 
respectively. It is apparent that the items 
initially were measuring what they were sup- 
posed to measure, but when they were assem- 
bled into forced-choice format the nature of 
the items changed. Items which were once 
independent of one another became, in the 
forced-choice form, relative items, dependent 
upon one another for their merit. Accord- 
ingly, one might expect item characteristics 
such as “discrimination indices” to wash out 
in the forced-choice test, particularly if they 
were not large to begin with. On the other 
hand, if they were sufficiently large one won- 
ders what virtue there would be to putting 
them into forced-choice form. 

Another basis for the obtained unreliability 
resides in the high probability of a person’s 
obtaining a chance score. For each tetrad 
the subjects were instructed either to choose 
the two they most agreed with or the two 
they most disagreed with. If only the dis- 
criminating items are scored, a person has one 
possibility in six of obtaining either a better 
or worse than chance score. He had four 
possibilities in six of getting a chance score. 
This situation would be considerably im- 
proved by instructions to select one “most 
agree” and one “most disagree” item, although 
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subjects are reported (6) to dislike this form 
of test. 

It is reasonable to believe that the low re- 
liability of this test may be attributed to the 
large opportunities for obtaining a chance 
score, and to the conversion of relatively uni- 
dimensional items into complex tetrads. 


Received September 13, 1956. 
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A Simplified Formula for Measuring Abstraction in 
Writing 


Paul J. Gillie 


Snohomish, Washington 


The presence of abstraction in writing is 
receiving increased attention through activity 
in the related fields of semantics, content 
analysis, and psychological aspects of com- 
munication. In many studies in these fields 
it would be useful to have a convenient and 
reliable method of measuring the level or de- 
gree of abstraction in written communication. 
No completely acceptable method of measur- 
ing abstraction has yet been devised. 

Although abstract writing is rather easily 
defined as that which contains a relatively 
high proportion of generalities and concepts 
far removed from basic life experiences, it has 
not been possible to secure universal agree- 
ment on precisely which words and phrases 
represent abstractions or on the compara- 
tive degree of abstraction of various abstract 
terms. ‘Thus, in their basic research in read- 
ability, Gray and Leary (3, pp. 105-106) 
abandoned any attempt to measure abstrac- 
tion because of the difficulty of achieving ob- 
jectivity. One significant attempt to over- 
come this difficulty, by precisely defining 
“definite words,” has been made by Flesch 
(1) in an abstraction formula which employs 
the same technique used by him with consid- 
erable success in the measurement of read- 
ability. 

If the readability-formula technique is to 
be used for estimating abstraction, the ele- 
ments employed as indices should meet two 
qualifications, in addition to being valid: (a) 
they must be objectively identifiable, not sub- 
ject to variations which depend upon judg- 
ments of individual raters, and (b) they must 
be convenient to identify and enumerate so 
that application of the measure does not be- 
come so burdensome as to make it impracti- 
cal. It is conceivable that some accuracy 
may be sacrificed in order to meet these quali- 
fications. 

The elements used in most readability for- 
mulas are both convenient and unambiguous. 


214 


Counting elements which contribute either to 
abstractness or concreteness is not so con- 
venient, however. Unless the directions for 
identifying abstract or concrete terms are so 
explicit as to be burdensome, the dependence 
upon individual judgment reduces reliability. 
The elaborateness of the formula directions, 
which contain sixteen substeps with thirteen 
limitations or exceptions, has been the prin- 
cipal objection to Flesch’s count of “definite 
words” (4). 

This study represents an attempt to sim- 
plify the measurement of abstraction by 
combining a few carefully selected elements 
related to abstraction in an experimental for- 
mula which can easily be applied to any writ- 
ten material. 

The procedures of this study included the 
following: 

1. Analysis of the sixteen categories of 
“definite words” listed by Flesch to determine 
which of these categories would be of great- 
est relevance in a simplified abstraction for- 
mula. 

2. Validation of a new and experimental 
index of abstraction. 

3. Calculation of a multiple regression 
equation based on the experimental index 
of abstraction and selected categories from 
Flesch’s list. 


Analysis of the Categories of 
“Definite Words” 


Flesch (1) lists sixteen types of “definite 
words” which are to be enumerated as part 
of his abstraction formula. The sum total of 
all such “definite words” can also be used as 
a measure of abstraction, according to a sim- 
plified and popularized version (2). These 
categories include: names of people; natural 
gender nouns; nouns denoting time; numeral 
adjectives; finite verb forms; present parti- 
ciples; personal pronouns; the words here 
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there, then, and now; the words who, whom, 
when, where, why, and how; the words this, 
that, these, those, each, same, and both; in- 
terrogative what and which; possessive pro- 
nouns; relative pronoun that; the words yes 
and no; interjections; the definite article the 
and the noun it modifies. 

While this detailed count of definite words 
is especially interesting and informative for a 
writer, its practical value for those interested 
in a quick estimate of abstraction is question- 
able, especially as the formula also requires 
an equally tedious syllable count. 

A procedure was set up for testing the abil- 
ity of each separate category to discriminate 
between abstract and concrete writing and 
also for determining which few of the sixteen 
categories could be profitably used in a more 
economical estimate of concreteness. Thirty 
200-word samples were selected from each of 
two anthologies. Man Against Nature (5), 
a collection of true adventure narratives, was 
chosen as a Criterion of concrete writing. The 
Age of Analysis (6), an anthology of modern 
philosophy, was chosen as a criterion of ab- 
stract writing. In accounts of human adven- 
ture one would expect concrete, specific, and 


vivid words to be predominant; in philoso- 
phy, the use of abstractions is to be expected. 
The selection of anthologies as criteria as- 
sured that many different writers would be 
represented in this portion of the study. 

For all sixty samples a count was made of 
the nurnber of words found in each of the six- 


teen categories of “definite words.”” The num- 
ber of “nouns of abstraction,” defined later in 
the study, was also counted. From these 
data, point biserial coefficients of correlation 
were computed as indices of the relative abil- 
ity of each category to discriminate between 
abstract and concrete writing, in much the 
same manner as this coefficient is used in test 
item analysis to determine the ability of items 
to distinguish between upper and lower cri- 
terion groups. In this case, the adventure 
anthology was denoted “upper” criterion as a 
larger number of “definite words’ was ex- 
pected in this book than in the philosophy 
book. 

The sixteen categories are listed in Table 1, 
in the order of their apparent ability to dis- 


Table 1 


Relative Occurrence and Discrimination Coefficients of 
Flesch’s 16 Categories of “Definite Words” 
Computed from Samples Taken from 
Two Anthologies 


Discrimi- 
nation 
Coeth 
cient 
(Tptie) 


Flesch 
Category 
Number 


Relative 
Occur 
rence 


Description of 
Category 


16 Definite article 


3 Time nouns 


047 O8 
007 4s 
O08 

O00 

O04 

046 

074 

022 

043 

O07 

O05 

O04 

Ves and no 0 


a 


Present participles 


Interjections 
Natural gender nouns 
Personal pronouns 
Finite verbs 


Numeral adjectives 


Ce 


Possessive pronouns 


= 


W ho, whom, when, where 


x 


Here, there, then, now 


— 
wo 


Relative pronoun that 


—_ ws 
— = 


Interrogative what and 

which 003 
This, that, et 021 
O10 


—_— 
—= & 


Names of people 


Totals 293 


criminate. 
ticles 


One category alone—detinite ar- 
actually proved to be superior to the 
total count of definite words in discriminating 
ability. Three categories were negatively dis- 
criminating: they were found more often in 
abstract writing than in concrete writing. 

From the list of sixteen categories two were 
chosen for further study to be included with 
a count of “nouns of abstraction” in a simple 
abstraction formula. Category 16 (definite 
article) and Category 5 (finite verbs) were 
selected. These categories had discrimination 
coefficients of .68 and .20, respectively. 
Categories 3, 6, 15, and 2 were not selected 
despite relatively high coefficients of discrimi- 
nation because of their low frequency 


Predicting Abstraction Score 


The next step in the study was to determine 
the ability of these two categories and the 
count of “nouns of abstraction” to predict the 


Flesch abstraction score. This was accom- 
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plished by taking one hundred 200-word sam- 
ples from a wide variety of reading materials 
representing different types of content and 
many levels of difficulty and abstraction. The 
sources included five magazines, True Con- 
fessions, Reader’s Digest, Saturday Evening 
Post, Newsweek, and Atlantic Monthly; so- 
cial studies and reading texts at the fourth-, 
sixth-, and eighth-grade levels; and college 
level texts in social psychology and introduc- 
tory philosophy. 

The complete Flesch abstraction score for 
each of the hundred samples was computed. 
This involved counting the total number of 
definite words and figuring the average word 
length in syllables and combining these two 
factors according to the formula directions. 
A count was also made of the number of 
words in Categories 5 and 16 and the num- 
ber of “nouns of abstraction.’ The correla- 
tion of each of these elements with the Flesch 
abstraction score was then computed. 

“Nouns of abstraction” were arbitrarily de- 
fined as those ending in certain suffixes which 
are used to denote generalizations, abstract 
conditions, and qualities. These seven suf- 
fixes were: -ness, -ment, -ship, -dom, -nce (as 
in intelligence or fragrance), -ion, and -y. 
The -y suffix includes many compound suf- 
fixes as in geology, hostility, and agronomy. 
It does not include the -y suffix when used as 
a diminutive. Nouns with these endings do 
not, of course, constitute all of the abstract 
words in the English language, nor are all 
such nouns equally abstract. Their common 
endings, however, make them easy to locate 
and enumerate. In most cases they represent 
generalized concepts which do not have im- 
mediate tangible referents. This element, 
“nouns of abstraction,” had a point biserial 
coefficient of .52 on the adventure and phi- 
losophy criteria, establishing it as more dis- 


Table 2 


Correlations Among Measures of Abstraction 





Nouns of 
Abstraction 


Categories 
Sand 16 


Flesch abstraction score 
Categories 5 and 16 


6680 ~ 7411 


— 4772 


Gillie 


criminating than fifteen of the sixteen Flesch 
categories. 

The correlations obtained among the three 
elements and the Flesch abstraction score are 
shown in Table 2. There was a high corre- 
lation (— .7411) between “nouns of abstrac- 
tion” and the Flesch score. 

These correlations were utilized in the com- 
position of a multiple regression equation hav- 
ing the Flesch abstraction score as the de- 
pendent variable and with two independent 
variables (Categories 5 and 16, X»2; nouns 
of abstraction, X,). Multiple regression was 
used here as the best possible method of ob- 
taining an indication of the relative weight to 
be assigned each element in estimating the 
degree of abstraction. The original regres- 
sion equation read: X’, = 3.045 + .0829 X» 
— .1665 Xs. This was multiplied by twelve 
and rounded to obtain a simple formula 
reading: Abstraction level = 36 + (number of 
definite articles, i.e., Category 16) + (num- 
ber of finite verbs, i.e., Category 5 per 200 
words) — (2 * nouns of abstraction per 200 
words). 

Multiple R for the formula is .8229. R of 
this size indicates that the formula will, on 
the average, give results very nearly equal to 
the Flesch abstraction formula. This formula 
is considerably easier to apply; any piece of 
writing can be tested by this formula in a 
fraction of the time required by the Flesch 
formula and with quite similar results. 

There are no precise and objective criteria 
for validating a measure of abstraction. Since 
this formula was derived in part from the 
Flesch abstraction formula and since the to- 
tal Flesch formula was used as the criterion 
for its validation, this formula cannot be pre- 
sumed to be any more valid as a measure of 
abstraction than the Flesch formula. Its chief 
advantage, therefore, is its ease of application. 


Directions for Use of the Formula 


This formula should not be used for any 
piece of writing containing less than 200 
words. Selections less than 500 words long 
can be tested without sampling. Long arti- 


cles, magazines, and books can be tested by 
taking several 200-word samples and averag- 
ing the results. 











Measuring Abstraction in Writing 


1. Count the number of finite verbs per 
200 words. Count all verbs of any tense 
which are in the first, second, or third per- 
son and which have subjects, either expressed 
or understood. Do not count nonfinite verb 
forms or verbals. In verb forms with aux- 
iliary words, count the auxiliary rather than 
the main verb. Do not count any form of 
the verb “to be” (is, was, are, were, will be, 
have been, etc.) when used only as a copula 
to link the subject with a predicate comple- 
ment. 

2. Count the number of definite articles 
and their nouns per 200 words. Count both 
the article the and the noun it modifies, but 
only if that noun is a single word not other- 
wise modified, either by an intervening adjec- 
tive or by a clause or phrase following the 
noun. Do not count the when modifying 
adjectives or noun-adjectives, as in the best, 
the Irish. 

3. Count the number of nouns of abstrac- 
tion per 200 words. Count all nouns ending 
in the suffixes -ness, -ment, -ship, -dom, -nce, 
-ion, and -y, including the plurals of such 
nouns. Count nouns ending in -y even when 
it is the end of a longer suffix like -itv or 
-ology but not when it is used as a diminu- 
tive. 

4. Add the numbers found in Steps 1 and 
2 and add 36 to this sum. 

5. Multiply the number found in Step 3 
by 2. 

6. From the total found in Step 4, sub- 
tract the result of Step 5. The result of this 
subtraction is the abstraction score. Scores 
should be interpreted as follows: 0-18, very 
abstract; 19-30, abstract; 31—42, fairly ab- 
stract; 43-54, standard; 55-66, fairly con- 
crete; 67-78, concrete; 79-90, very concrete. 

The categories listed in Step 6 above cor- 
respond to the seven categories used in inter- 
pretation of the Flesch formula. For the sake 
of relevance, the words “abstract” and “con- 
crete” were used in place of the words “diffi- 
cult” and “easy.” The results of application 
of this formula should be more meaningful 
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if reported in terms of the verbal descriptions 
rather than the numerical scores. As is the 
case with any of the readability formulas, 
numerical scores may imply an unwarranted 
degree of precision, 

This formula, when applied to the content 
of three magazines, gave the following results: 
True Confessions, 68, concrete; Reader's Di- 
gest, 51, standard; Atlantic Monthly, 41, 
fairly abstract. A college level philosophy 
text had a score of 31. These figures give 
some basis for comparison of additional re- 
sults. 


Summary 


Through an analysis of the separate cate- 
gories making up Flesch’s count of “definite 
words” and through the introduction of a new 
index, of abstraction, a convenient and simple 
formula for estimating abstraction in writing 
was devised. This formula was derived in 
part from the Flesch abstraction formula and 
was validated against the total Flesch for- 
mula. It contains elements which are easy 
to identify, which discriminate between ab 
stract and concrete writing, and which corre- 


late with related measures. It vives results 


quite similar to those of the lengthier and 


more complex Flesch formula. Future use of 
the formula will help to ascertain its validity 
and its limitations. 


Received September 10, 1956 
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Effect of Curved Text Upon Readability of Print ' 


Miles A. Tinker 


University of Minnesota 


When a printed page is lying flat upon a 
reading table which is perpendicular to the 
line of vision, the accommodation changes re- 
quired for clear seeing as the eyes shift from 
fixation to fixation along a line of print are 
relatively slight and do not seem to interfere 
with the mechanics of reading. Furthermore, 
in such a situation the letters and word forms 
are Clearly defined and readily perceived. In 
many books, such as thick textbooks, dic- 
tionaries, and bound journals, the printed 
page does not lie flat. In these volumes there 
is considerable curvature of the page near the 
inner or gutter margin where the pages are 
bound. Traditional practice is to have this 
inner margin narrowest with relatively wide 
outside and bottom margins. In an example 
of a typical journal page the widths of mar- 
gins are: inner, 1%, inch; top, '%¢ inch; 
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outside, 1%, inches; and bottom, 1%@ inches. 
In a large book on typographical practice the 
margins run: inner, ''4, inch; top, 1, inch; 
outside, 1 inch; and bottom, 1! inches. Use 
of such narrow inner margins results in ap- 
preciable curvature of about one-third of the 
line of print in single column printing and up 
to four-fifths of the line in multiple column 
printing for the column next to bound edge. 

This curvature of the printed page may 
well interfere with easy and rapid perception 
in reading. First, the eye must accommodate 
anew to read successive words with the change 
in focal distance that occurs with each eye 
movement to a new fixation along the curved 
surface. Since accommodation changes are 
relatively slow, speed of reading may be re- 
tarded when the reading is done under such 
conditions. Secondly, when part of a curved 
line of print is farther from the eyes than 
other parts, the letter and word forms along 
the line are distorted. This distortion may 
seriously interfere with visibility and hence 


! This research was supported by a grant from the 
Graduate School, University of Minnesota. 


with clear and rapid perception when there is 
considerable curvature of the printed page. 

The specific problem in this study is to 
investigate changes that occur in speed of 
reading and in visibility of words when the 
reading copy is curved. The results should 
provide some evidence concerning handicaps 
to easy and speedy perception while reading 
curved lines of print in large books. Speed 
of perception will be measured in terms of 
speed of reading with comprehension con- 
stant. The effect from distortions of letter 
and word forms on ease of perceiving will be 
measured in terms of visibility of selected 
words. 


Materials and Procedure 


The reading material consisted of Forms A and B 
of the Chapman-Cook Speed of Reading Test. The 
typography was identical in the two forms: 10-point 
Scotch Roman type face with 2-point leading in a 
19-pica line width on eggshell paper stock. This 
test has 30 paragraphs of 30 words each in Form A 
and in Form B. The paragraphs in a single form 
are in two columns of 15 paragraphs each printed 
on a single page. As a check on comprehension, the 
reader has to indicate which word in a paragraph 
spoils the meaning in that paragraph. For college 
students, the accuracy of this response is 99.7%. 
This means that comprehension is virtually constant 
and that speed of perception in reading is measured 
as a single variable. 

The two forms of the test were cut down the cen- 
ter to provide four test forms of 15 paragraphs each, 
labeled Al, A2, Bl, B2. Each of these forms was 
mounted on a light gray cardboard which could be 
attached to the curved reading stand. 

Materials for visibility measurements consisted of 
20 five-letter words cut from the reading tests. Each 
word was mounted on a 3 5-inch index card. 
These words were divided into five groups of four 
words each. The first group was a practice series 
and the others made up four experimental series. 

A special reading stand was constructed. It con- 
sisted of a cylinder 8 in. in diameter which could be 
set in any position from the horizontal to the verti- 
cal. This curved surface had about the curvature of 
a page in a large book lying open on a table. An 
additional reading stand with a flat surface was set 
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Curved Text and Readability of Print 


at 45° to the table top to provide optimal conditions 
for reading flat copy. 

The experiment was conducted in a light labora- 
tory with 25 footcandles of well diffused indirect 
illumination. Subjects were 104 college students. 
Each subject was tested individually. There were 
practice exercises for speed of reading and visibility 
measurements. The visibility measurements were 
made with the Luckiesh-Moss Visibility Meter. A 
head rest maintained the subject’s head in approxi- 
mately a constant position with the eyes at 15 in. 
from the copy. Order of subjects was systematically 
rotated. 

On arrival at the laboratory, a subject was adapted 
to the illumination employed while doing the prac- 
tice exercises. In Group I, the Control Group, all 
reading and visibility measurements were done with 
material on the flat reading surface set at 45° to 
table top. The four forms of the reading tests (Al, 
A2, Bl, B2) were read in that order and time in 
seconds to complete each form was recorded. In 
a similar manner, the visibility measurements were 
taken. The successive groups of four words each 
were labeled V1-V4, V5—V8, etc. In Test Group IH, 
the order for speed of reading was: Form Al flat on 
reading stand at 45°; A2 on curved apparatus at 
45°; Bi on curved apparatus in horizontal position 
(0°); and B2 on curved apparatus in vertical posi- 
tion (90°). Visibility measurements were made in 
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the same positions with the successive groups of 
words 


Results and Discussion 


The basic data for the speed of reading 
measurements are given in Table 1. The 
comparisons in Groups la, Ib, and Ic, the 
Control Group, yield data on the equivalence 
of the four sets of 15 paragraphs. These data 
were employed to “correct” the differences in 
Column 6. In Group Ila, the curved text was 
read 7.2‘ slower than flat text when both 
are at 45° to the horizontal (table top). 
When the cylinder holding the curved text 
was horizontal (0°), speed of reading was re- 
tarded 36.5°% in comparison with flat text at 
45°. And when the cylinder holding curved 
text was vertical (90°), speed of reading was 
retarded by 11.4°%. Curved text, therefore, 
retards speed of reading by statistically sig- 
nificant amounts in any of the three align- 
ments. The most serious retardation was 
when the alignment was horizontal such as 
when a large book is lying on a flat table. 


Table 1 


Effect of Curved Text on Speed of Reading 


(The mean score is the average number of seconds taken to read 15 paragraphs of 30 words each in the 


Chapman-Cook Speed of Reading Test. 


Mean in 
Seconds 


Position 
of Copy 


Test 


Group Form 


Al Flat: 
A2 Flat 


83.75 


Ia 90.06 


Al Flat 


/ 83.75 
Bl Flat 


$1.88 


Al Flat 
B2 Flat 


83.75 
80.09 


Al Flat 
A2 Curved 


85.23 
97.63 


Al Flat: 45 
Bl Curved 


85.23 
114.52 


Al Flat: 45 
B2 Curved: 90° 


85.23 


Hc 91.86 


SEu 


In each test group, NV = 52 college students, 104 in all) 


Difference Between 
Corrected Means® in 


Seconds Per Cent 


2.99 
3.33 


0.00 


2.99 
3.22 


2.99 
3.13 


0.00 


2.84 
3.55 


6.10 


2.84 
5.63 


31.15 


2.84 
4.19 


9069 


* The differences between the means in each of the groups Ila, Ilb, and Llc is “corrected” by the amount of the differences 
in the corresponding Control Groups (la, ~—6.31; Ib, +1.87; Ic, 43.06 eec.). 
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Table 2 


Effect of Curved Text on Visibility of Words 


(The mean visibility score is the average for perceiving four words by use of the Luckiesh-Moss Visibility Meter. 
In each test group, N = 52 college students, 104 in all) 


Word 
Series 


Position 
Group of Copy Mean 
la V1-V4 Flat: 45° 


V5S-V8 Flat: 45° 


4.43 
4.37 


Vi-V4 
V9-V12 


Flat: 45° 
Flat: 45° 


4.43 
4.71 


V1-V4 
V13-V16 


Flat: 45° 
Flat: 45° 


4.43 
4.74 
Vi-V4 
V5-V8 


Flat: 45° 
Curved: 45° 


4.02 
3.57 
V1-V4 

V9-V12 


Flat: 45° 
Curved: 0° 


4.02 
3.46 


V1-V4 
V13-V16 


Flat: 45 
Curved; 90° 


4.02 
2.76 


* The differences between the means in each of the groups [lx 
le, 


in the corresponding Control Groups (la, +0.06; Ib, —0,28; 


The basic data for the visibility measure- 
ments are given in Table 2. The Control 
Groups (la, Ib, Ic) provide data on equiva- 
lence of the groups of words used. In com- 
parison with flat copy at the 45° position, 
curved text at the same position reduced visi- 
bility by 9.73%; curved print in the hori- 
zontal position reduced visibility by 20%; 
and in the vertical position by 39.1%.  Visi- 
bility of curved print, and hence ease of per- 
ceiving words is, therefore, reduced signifi- 
cantly in all alignments. 

The results in Tables 1 and 2 show that 
readability of print is adversely affected to a 
marked degree when the copy is curved so 
that parts of the lines of print are at different 
distances from the eyes. This is true for both 
speed of reading and for visibility of word 
forms. Although accommodation changes oc- 
cur while reading the curved text, it is diffi- 
cult to estimate the relative importance of 
these changes in this experiment. The effects 
of the curved print on visibility are so great 
that they would seem to be dominant in re- 
ducing speed of reading. 
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Difference Between 
Corrected Means* in 


Score Per Cent 
0.00 0.0 


0.00 


0.00 0.0 


AS 
13 


0.39 9.73 5.95 


AS 
12 


0.84 — 20.0 8.63 


AS 
09 


1.57 39.1 16.58 


by the amount of the differences 


a, IIb, and Ile is “corrected” 
0.31). 


The curved text employed in this experi- 
ment is a close approximation to the practical 
situation found in many large books and 
bound journals. The results obtained indi- 
cate that the marked curvature of lines of 
print near the inner margin of such volumes 
without doubt adversely affect readability. If 
much larger inner margins were used, the 
situation would be much improved. 

One may assume that it is difficult to get 
printers to depart from the traditional prac- 
tice of employing narrow inner or gutter mar- 
gins and of wasting paper in wide outer and 
bottom margins. Nevertheless, in terms of 
hygiene of reading, much wider inner margins 
would make for easier perception and faster 
reading. The width of the outer, top, and 
bottom margins bears no relationship to the 
readability of the print. Esthetics, however, 
demands use of some marginal space. It 
would seem that an attractive printed page 
could be worked out with a rearrangement of 
marginal space so that the inner margin would 
be considerably wider. 





Curved Text and Readability of Print 


Summary 


1. This experiment was designed to investi- 
gate the effects of curved text upon speed of 
reading and upon visibility of word forms. 

2. The subjects were 104 college students, 
52 in each of the subgroups. 

3. The rate of reading curved print was 
significantly slower than for flat copy. 

4. Visibility or ease of perceiving words 
was reduced significantly for curved text in 
comparison with flat text. 

5. The retardation in speed of reading 
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curved text seems to be due largely to re- 
duced visibility of word forms. The need for 
constant changes in accommodation in read- 
ing the curved text may also be involved in 
reducing speed of reading the curved text. 

6. It is suggested that wider inner margins 
be employed in large books and magazines to 
avoid the marked curvature of the printed 
page now present in such volumes. This 
change would improve readability of print 
markedly. 


Received September 19, 1956 
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The Relationship of Style Difficulty, Practice, and Ability to 
Efficiency of Reading and to Retention ’ 


George R. Klare 


Ohio University 
Emir H. Shuford 
United States Air Force 


and William H. Nichols 
The RAND Corporation 


This study is one of a series concerned with 
the learning and retention of printed technical 
material as related to certain “communica- 
tion” variables. The earlier studies by the 
authors and associates (7, 8, 9, 10, 11)? in- 
volved 20 or 25 minutes of reading of tech- 
nical passages, varied in definable ways, fol- 
lowed by a multiple-choice retention test. 
This procedure has generally been followed 
by other investigators, especially in examin- 
ing the effect of style difficulty upon com- 
prehension (see Klare and Buck [6] ). 

These studies necessarily involved the meth- 
odological problem of differences in amount 
read by different Ss in the allotted reading 
time. In the studies by the authors, Ss were 
permitted (and, in fact, encouraged) to re- 
read the passages as often as time permitted. 
Though Ss recorded amount read, there was 
some question of the reliability of these rec- 
ords. More important, the relationship of the 
various communication variables to retention 
may have been confounded by the variations 
in amount read by different Ss. 

Preliminary work with an eye-movement 
camera indicated its usefulness in obtaining 
control of amount read and, at the same time, 
securing accurate measures of reading effi- 

1This research was supported in part by the 
United States Air Force under Contract No. AF 
$3(038)-25726, monitored by the Air Force Person- 
nel and Training Research Center. Permission is 
granted for reproduction, translation, publication, 
use, and disposal in whole and in part by or for the 
United States Government. 

The authors gratefully acknowledge the suggestions 
and contributions of Drs. L. H. Lanier and L. M 
Stolurow of the University of Illinois, where the data 
for this study were collected. 

* A sixth study, on the relationship of organization 


to the learning of technical material, is in press (Edu- 
cational Res. Bull.) 
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ciency. In the present experiment such a 
camera was used in obtaining the reading effi- 
ciency measures of words read per second and 
per fixation. Retention was indicated by 
scores on both modified recall and word rec- 
ognition tests, whereas multiple-choice test 
scores had been used in the earlier studies. 

Since the style difficulty of passages had 
been found to be related to retention in the 
earlier type of study (10), this independent 
variable was again selected for use. The sec- 
ond variable used was amount of practice 
(number of readings) of the technical pas- 
sages. Relatively little work has been done 
on the relationship between increased readings 
and amount of contextual material learned 
despite the frequent hortative use of this 
variable by instructors. 

The final variable studied was the ability 
of Ss. There is a considerable body of litera- 
ture in psychology which suggests that the 
ability variable can be expected to account 
for significant variance in almost any kind of 
learning study that is attempted. It might 
therefore seem reasonable that ability differ- 
ences would have a predictable effect upon 
both the reading efficiency and retention of 
Ss, but studies such as that of Dunn (3) in- 
dicate that this effect may not be a simple 
one. Dunn found that fixations during read- 
ing for his mentally normal and retarded Ss 
did not differ significantly, but that compre- 
hension scores for the former were signifi- 
cantly higher. 

In addition to the question of the relation- 
ship of these variables to reading efficiency 
and retention, a related question involved the 
possibility of interaction among the several 








Efficiency of Reading and Retention 


variables themselves. The effect of style diffi- 
culty upon the retention of subjects of differ- 
ent levels of ability, and the relationship of 
style difficulty to practice, were thought to be 
of particular importance. 


Method 
Apparatus 


The eye-movement camera used was the American 
Optical Company’s “Ophthalmograph” (a modified 
Dodge binocular type). For a description of the 
construction and use of the Ophthalmograph and 
other eye-movement cameras, see Carmichael and 
Dearborn (1) 


Experimental Materials 


Reading material. The reading material used was 
a passage taken from a printed technical lesson used 
in the training of reciprocating engine mechanics in 
the Air Force. The content of the passage concerned 
the operation of the supercharger drain and return 
valve of an engine. 

Two versions of this content were used, termed E 
(“easy”) and H (“hard”). These versions were pre- 
pared from the standard lesson used in Air Force 
training by varying: (a) the percentage of short, 
familiar, frequently used words, (b) average sentence 
length (in words), and (c) the proportion of con- 
crete to abstract wordings. Technical experts served 
as judges in determining that only the style difficulty 
or structure of the material (“how said”) was 
changed, “content” (“what said”) and specific tech- 
nical terms remaining constant 

Table 1 presents the results of style difficulty 
analyses of these versions. Word lengths are given 
along with counts of certain style elements and for- 
mula values. These latter are based on the read- 
ability formulas of Dale and Chall (2) and Flesch 
(4) and the level of abstraction formula of Flesch 
(5). The table shows that, while the various ele- 
ments and formulas disagree slightly, there is a clear 
difference in style difficulty between E to H 

Tests. Two retention tests were used, a modified 
recall test and a word recognition test. The modi- 
fied recall test consisted of four sentences with two 
blanks in each (ie., a total of eight blanks). An at- 
tempt was made to keep the wording and presenta- 
tion of the sentences different from that of either 
version (E or H) in order to prevent bias. The 
words necessary to fill in the blanks correctly were 
common to both versions. The recognition test con- 
sisted of 18 words, nine of which were common to 
both versions and nine of which came from tech- 
nical training material other than these versions. Ss 
were instructed to check the nine words they re- 
membered reading, but to keep in mind that their 
score would be the number right minus the number 
wrong 
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Table 1 


Style Difficulty Analyses of Reading Materials 


Version 
Measure bE H 

Length (words) 76 75 
Average sentence length (words) 15 25 
Average word length (syllables) 1.31 1.73 
Unfamiliar words* (per 100 words) 12 35 
Definite words** (per 100 words) 35 9 
Dale-Chall readability values 

Raw score 6.28 10.40 

Estimated grade level 7-8 16+ 
Flesch readability values 

Raw score SI 35 

Estimated grade level 5-6 13-15 
Flesch level of abstraction values 

Raw score 76 31 

Estimated grade level 7-8 13-15 


* Defined as words not on the Dale list (2 
** Defined as non-abstract according to Flesch's criteria (5) 


Subjects and Procedure 


The Ss were 120 male airmen in a mechanics course 
at Chanute AFB, Illinois. Half of the Ss were “high” 
in mechanica! aptitude (56 had a mechanical aptitude 
“stanine” of nine and 4 of eight), and half were 
“low” (54 had a “stanine” of four and 6 of three). 
Each S was tested individually according to the pro- 
cedure which follows 

The S was placed in position, the camera was 
adjusted and focused, and instructions were read. 
These instructions included the direction to read “at 
your normal rate” and the information that a “diffi 
cult” test would follow the readings. A _ passage 
other than the experimental passage was then ex 
posed and S read it and subsequently took a difficult 
recall test covering this passage (while seated at a 
table away from the camera). Following this the 
camera was again adjusted and focused and S read 
one version (EF or H) of the experimental passage 
one or three times. He was then seated again at the 
table and took the recognition and recall tests, in 
that order. 

A 222 factorial design (E vs. H style ver- 
sions * high vs. low aptitude * one vs. three read- 
ings) was used in randomly assigning Ss to treat- 
ments. Reading times (to .2 sec.) were recorded by 
stopwatch, and fixations were defined as all vertical 
lines on the film. Earlier studies had shown that 
stopwatch measurement of time agreed closely with 
that taken directly from the developed film, and was 
considerably more convenient. In order to account 
for the difference of one word in the length of the 
two versions (E contained 76 and H 75 words, as 
noted in Table 1), the measure adopted for use was 
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Table 2 


F Ratios from Four Analyses of Variance of Reading Efficiency and Retention Measures 


Reading Efficiency Measures 


Source 


~ 
& 


Versions 

Readings 

Ability 

Versions X readings 

Versions X ability 

Readings X ability 

Versions K readings X ability 
Within 


ee ee ee 


_— 
—_ 


Total 


*O1 <p <.05 

* OO1 <p < Ol. 
7p < OO1, 

t Within df for words/tixation = 72 
tt Total df for words/fixation = 79 


words read per second. The earlier studies had also 
indicated that separate analysis of the infrequent re- 
gressions of the Ss added little to the information to 
be gained from an analysis of fixations, and there- 
fore the more convenient method of counting all 
vertical lines as fixations was adopted. As before, 
these counts were transformed to words read per 
fixation to account for the slight difference in length 
«f the experimental versions. 

The rationale for providing a preliminary reading 
prior to the introduction of the experimental pas- 
much to familiarize Ss with the 
camera as to induce a strong “set to learn.” Earlier 
studies had indicated that such a set could not easily 
be provided by verbal instructions alone but could 
be induced readily by the use of a passage and a 
difficult test preceding the experimental reading. 
With a weaker set to learn, scores on retention tests 
frequently proved too low or too unreliable for the 
desired 


sage was not so 


inalyses 


Results 


Of the 120 Ss in the experiment, 60 were of 
low ability and 60 of high ability. The eye- 
movement records of 20 of the low-ability Ss 
could not be analyzed due to the poor quality 
of their fixations and or saccadic movements, 
which left 40 usable fixation records.* The 


8 The poor records seemed to be due primarily to 
the droopy eyelids of the low-ability Ss, which sug- 
gests a promising measure of intelligence. Lest this 
be carried further, however, the experimenters hasten 
to add that the droop for most Ss seemed to increase 
from morning to afternoon, which creates obviously 
unfortunate implications for the notion of constancy 
of the IQ 


Words/Second 


8.11** 
3.73 
23.50°** 


Retention Measures 


Recognition 


Words/ Fixation Recall 


5.61* 
6.57* 
7.93** 


5.23° 
14.53*** 
32.08*** 


1.31 
28.56*** 
24.63*** 


high-ability group was then also randomly re- 
duced to 40 fixation records for analysis, mak- 
ing a total of 80 records. Time records were, 
of course, available for all 120 Ss since they 
were secured independently of the film rec- 
ords. 

Inspection of the results of the study indi- 
cated that all of the mean values were in the 
predicted directions. That is, an easy style, 
three readings, and high ability produced 
greater reading efficiency (on both words /sec- 
ond and words fixation measures) and greater 
retention (on both recall and recognition 
tests) than a hard style, one reading, and low 
ability. Four 2 x 2 * 2 analyses of variance 
were computed on these data, one for each of 
the dependent variables. Results of these 
analyses are presented in Table 2. 

Examination of this table shows that all 
three independent variables (style difficulty, 
practice, and ability) produced significant dif- 
ferences in the words/fixation measure. Style 
difficulty and ability also produced significant 
differences in the words/second variable, but 
readings did not. Even in the case of read- 
ings, however, the means were in the expected 
rank order, and a previous unpublished study 
showed a significant gain in words ‘second for 
increased readings. Since there seemed to be 
a rather large amount of variance attribut- 
able to the readings * ability interaction (see 
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Table 2), the main effect of readings may 
have been somewhat obscured. Inspection of 
the means showed that the high-ability Ss 
gained considerably more words/second be- 
tween one and three readings than did the 
low-ability Ss. 

Table 2 also shows that all three independ- 
ent variables produced significant differences 
in scores on the recall test, but only readings 
and ability produced significant differences in 
recognition test scores. The reason for this 
may well lie in the differences in the tests, 
the recall test being more nearly a “compre- 
hension” measure than the recognition test. 
It will be remembered that in the recall test 
S was required to fill in blanks in unfamiliar 
sentences (i.e., sentences different from either 
version). In the recognition test, on the other 
hand, S was merely required to indicate 
whether or not he had seen a word previously 
in his reading. Since the words that were 
seen were identical in both the E and H ver- 
sions, style difficulty might not be expected 
to produce a. difference in scores (whereas 
number of readings might). 


Discussion 


As noted earlier, a previous study of style 
difficulty (10) indicated that an “easy” style 


produces greater retention than a “hard” 
style. The effect of style difficulty was con- 
founded, however, since Ss were encouraged 
to reread given passages if a first reading was 
completed before the allotted reading time 
was up. There was a significant tendency for 
Ss to read the easy style faster than the hard, 
and it was felt that the extra reading thus 
provided may have been at least partially re- 
sponsible for the increased retention scores. 
The present study, in which number of read- 
ings was used as a control rather than a set 
amount of reading time, showed that the easy 
style still produced significantly greater re- 
tention. This conclusion must be qualified to 
the extent that this difference appeared when 
retention was measured with a modified recall 
(“comprehension’’) test but not with a recog- 
nition test. The easy style also produced 
greater reading efficiency than the hard style, 
as measured in terms of both words read per 
fixation and words read per second. 


The present study also showed that an in- 
crease in the number of readings of a pas- 
sage results in increased reading efficiency and 
increased retention. The effect upon reten- 
tion seemed the more clear-cut; more able 
Ss appeared to profit relatively more in effi- 
ciency with increased readings than did the 
less able. 

As for differences in ability of readers as 
such, the study showed that readers of higher 
ability read more efficiently and with greater 
retention than did readers of lower ability. 
Dunn (3), it will be recalled, found a sig- 
nificant difference in comprehension scores fa- 
voring normal over feeble-minded Ss, but no 
difference in number of fixations. In the pres- 
ent study a strong set to learn was induced in 
Ss by a preliminary reading and difficult test 
preceding reading and testing of the experi- 
mental passage. Number of words read per 
fixation and per second were recorded for the 
preliminary reading, which permitted a com- 
parison of values with the experimental read- 
ing. In the former situation, the number of 
words read per fixation by the more able and 
iess able Ss did not differ significantly as they 
did in the latter situation. These differences 
in the preliminary and experimental situations 
appear to be due to the change in set, the test- 
induced set being stronger than the set pro- 
vided by verbal instructions only. It appears 
possible that Dunn's results may have been 
due to his Ss having had a relatively weak set 
to learn. 


Summary 


Two groups of Ss, one of high and one of 
low mechanical ability, read a technical pas- 
sage before an eye-movement camera. Two 
alternate versions of the passage were used, 
one written in an “easy” and one in a “hard” 
style, and Ss were given either one or three 
readings. Reading efficiency was measured in 
terms of number of words read per second 
and number read per fixation, and retention 
was measured with scores based on modified 
recall and word recognition tests. A strong 
“set to learn” was induced in Ss by having 
them read a preliminary passage and take a 
difficult test over it prior to the experimental 
reading. 
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The results of the study indicated that the 
easy style produced significantly higher scores 
than did the hard style on the words/fixation, 
words/second and recall measures. Three 
readings, as compared to one, resulted in sig- 
nificant increases in the words/second, recall, 
and recognition measures. The Ss of high 
ability received significantly higher scores 
than did those of low ability on all four de- 
pendent measures; it was noted, however, that 
further analysis suggested that the significant 
difference between the two groups in words/ 
fixation may well have been due to the use of 
a strong set to learn. 


Received November 19, 1956. 


References 


. Carmichael, L., & Dearborn, W. F. Reading and 
visual fatigue. New York: Houghton Mifflin, 
1947. 

Dale, E., & Chall, Jeanne S. A formula for pre- 
dicting readability: instructions. Educ. Res. 
Bull, 1948, 27, 37-54. 

. Dunn, L. M. C. A comparative study of men- 
tally retarded and mentally normal boys of 
the same mental age on some aspects of the 


reading process. Unpublished doctoral disser- 
tation, Univer. of Illinois, 1953. 


. Flesch, R. A new readability yardstick. J. appl. 


Psychol., 1948, 32, 221-233. 


. Flesch, R. Measuring the level of abstraction. 


J. appl. Psychol., 1950, 34, 384-390. 


. Klare, G. R., & Buck, B. Know your reader. 


Camden, N. J.: Thomas Nelson, 1954. 


. Klare, G. R., Gustafson, L. M., Mabry, J. E., & 


Shuford, E. H. The relationship of immedi- 
ate retention of technical training material to 
career preferences and aptitudes. J. educ. 
Psychol., 1955, 46, 321-329. 


. Klare, G. R., Mabry, J. E., & Gustafson, L. M. 


The relationship of human interest to immedi- 
ate retention and to acceptability of technical 
material. J. appl. Psychol., 1955, 39, 92-95. 


. Klare, G. R., Mabry, J. E., & Gustafson, L. M. 


The relationship of patterning (underlining) 
to immediate retention and to acceptability of 
technical material. J. appl. Psychol., 1955, 
39, 40-42. 


. Klare, G. R., Mabry, J. E., & Gustafson, L. M 


The relationship of style difficulty to immedi- 
ate retention and to acceptability of technical 
material. J. educ. Psychol., 1955, 46, 287-295 


. Klare, G. R., Nichols, W. H., & Shuford, E. H. 


The relationship of typographic arrangement 
to the learning of technical training material. 
J. appl. Psychol., 1957, 41, 41-45 





Journal of Applied Psychology 
Vol. 41, No. 4, 1957 


The Comparative Effectiveness of Electric and Manual 
Typewriters in the Acquisition of Typing Skill in a 
Navy Radioman School ' 
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The capabilities of electric typewriters in 
speed and ease of operation exceed those of 
manual typewriters. For this reason, many 
business education authorities believe that 
typing students will reach a higher proficiency 
level in less time when electrics are used for 
training and will experience little difficulty in 
transferring from electric to manual machines. 
Winger (2) reviews and summarizes a num- 
ber of classroom comparisons of the two kinds 
of typewriters, all of which are in agreement 
with these beliefs. However, in an Air Force 
study (3) it was found that using electric 
typewriters was not advantageous in teaching 
beginning typists. 

Although many of the reports favoring elec- 
tric typewriters are based on subjective opin- 
ion rather than controlled experimentation 
with significance tests of the results, the pos- 
sible advantages of the electric typewriter as 
a teaching device justify experimental com- 
parisons. The Navy has a critical need for 
efficient manpower utilization. Typing skill 
is a performance requirement for many Navy 
ratings. Improvement in typing instruction 
might reduce the total training time in a num- 
ber of Navy schools. On the other hand, 
where it is found that electric typewriters are 
not superior teaching devices, the Navy can 
effect a large, justifiable monetary saving by 
continuing to use manual typewriters. 

The present study was requested by the 
Service School Command, Naval Training 
Center, San Diego, California. This Com- 
mand considers the typing instruction prob- 

1A report of this experiment was presented at the 
1956 American Psychological Association Convention 
in Chicago 

2 This study was conducted while the author was 
a member of the U. S. Naval Personnel Research 
Field Activity, San Diego. The opinions expressed 
are solely those of the author and are in no way 
official; nor are they to be construed as representing 


those of the U. S. Naval Personnel Research Field 
Activity or Bureau of Naval Personnel. 


lem to be especially critical in its Radioman 
(RM) School. Therefore, it was decided to 
limit this study to investigating the acquisi- 
tion of the typing skills required of radiomen 
rather than conventional typing skills. Radio- 
men use typewriters for Morse code reception, 
and trainees are taught typing during the first 
four weeks of their course. Since effective 
code-taking speed cannot exceed typing speed, 
code-speed acquisition may be retarded by 
lack of typing skill. 

Accordingly, this experiment was designed 
to investigate two specific hypotheses: 

1. Radioman trainees will achieve more 
proficiency in typing mixed groups of letters 
and digits (cipher groups) when they are 
trained on electric typewriters than when they 
are trained on manual typewriters. 

2. The typing skill acquired on electric 
typewriters will be rapidly transferred to 
manual typewriters. 


Procedures 


With 20 electric same make 
available, it was possible to train 20 experimental Ss 
at one time. Therefore the control group was also 
limited to 20 Ss at a time 

From Class 8, which entered the Radioman School 
on 17 October 1955, the 40 students with the least 
typing skill, as determined by a 2-min. plain lan- 
guage pretest, were selected. These 40 students were 
stratified into two categories: those without previous 
typing experience and those with previous typing 
experience. Within these strata, students were ran- 
domly divided into an experimental group (to be 
trained on electric typewriters) and a control group 
(to be trained on manual typewriters). Of the 20 
students in each group, 10 were typists and 10 were 
nontypists. Separate rooms were used for training 
the electric and manual typewriter groups. Both 
kinds of typewriters had standard keyboards 

The assignment procedures—and subsequent train- 
ing procedures—were replicated with 40 students 
from Class 10, which entered school on 14 Novem 
ber 1955. One typist was hospitalized during the 
course of the experiment, leaving a total of 39 Ss 


typewriters of the 
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in the experimental group and 40 Ss in the control 
group. 

The experimental and control group means were 
equivalent for age (14.21 vs. 18.28), years of educa- 
tion (11.79 vs. 11.75), general ability as determined 
by scores on the Navy general classification and 
arithmetic tests (113.74 vs. 111.55), and code apti- 
tude as determined by scores on the radio code apti- 
tude test used by the Navy (65.44 vs. 66.18). 

Each typing class was taught by one instructor 
who remained with the class for the entire four weeks 
of typing instruction. For Class 10, the Class 8 elec- 
tric and manual typing instructors were switched. 
These two instructors were requested to follow their 
customary teaching methods throughout the eight 
weeks of the experiment. 

The Radioman School completes its typing instruc- 
tion in the first four 5-day weeks of the 16-week 
course, There are eight 45-min. class periods each 
day. For the first two weeks, typing class periods 
alternate with “rectype” class periods (during which 
instruction in code reception is given). During the 
third and fourth weeks one of the previous typing 
periods is used for instruction in radio procedures. 
Thus beginning students receive four periods of typ- 
ing instruction per day for the first two weeks and 
three periods of typing instruction per day for the 
third and fourth weeks. 

The Ss received three weeks of basic typing in- 
struction on their respective machines (Part I of the 
experiment). In the fourth week the experimental 
Ss were taught to use manual typewriters while the 
control Ss continued practice on the manual type- 
writers (Part II of the experiment). Since radio- 
men aboard ship normally receive code on manual 
typewriters, it was considered important to obtain 
information regarding progress in typing proficiency 
during the period of transfer from electric type- 
writers to manual typewriters 

From the 36-character alphabet (26 letters and 10 
digits), 250 characters were selected by the use of a 
table of random numbers. These 250 characters were 
divided into fifty 5-character mixed groups of letters 
and digits for use as 2-min. tests. Five different ar- 
rangements of these 50 mixed groups were made. 
Three other arrangements of the same 50 mixed 
groups constituted the 150-group “final examina- 
tion” (a 6-min. test with 15-sec. rest intervals every 
2 min.). These tests are considered to be equiva- 
lent, with the different arrangements eliminating 
practice effects. 

From the last day of the first week of training 
until the last day of the third week, Ss were given 
one of the 2-min. tests at the close of the final typ- 
ing period each day. There were 10 of these test 
administrations. The 6-min. Part I final examina- 
tion was administered at the close of the final typing 
period on the last day of the third week. 

During the fourth week (when the group trained 
on electric typewriters learned to use manual type- 
writers and the group trained on manual typewriters 


Henry L. Adams 


continued practice on manual typewriters), a 2-min. 
test was given at the close of every typing class up 
to the last class. There were also 10 of these test 
administrations. The 6-min. Part II final examina- 
tion was given at the close of the final typing class. 

For all the above tests a combination speed and 
accuracy score was obtained: the number of “words” 
typed per minute (with five strokes, including the 
space, counted as a word) minus the number of 
errors per minute. 

The general aim of the experimental procedures 
was to control unnecessary between-groups variation 
in instructional procedures, content, and other fac- 
tors affecting learning so that differences in profi- 
ciency between the groups would be due to differ- 
ences in the training devices and/or necessarily con- 
founded procedures. 


Table 1 


Mean Scores of the Electric Typewriter Ss and Manual 
Typewriter Ss on the Series of Typing 
Proficiency Tests 


Mean Score 

of Electric of Manual 

Typewriter Typewriter 
Ss Ss 


Mean Score 


Part I 


7.13 

9.95 
10.78 
11.55 
12.48 
12.73 
14.03 
14.93 
15.43 
15.60 


Final Exam 16.05 
Part I 
11.79 
13.36 
13.64 
14.67 
14.82 
14.72 
17 16.28 
18 16.18 
19 17.10 
20 17.38 


15.20 
16.05 
16.00 
16.80 
17.23 
16.35 
17.63 
18.10 
18.55 
18.68 
16.90 


Final Exam 18.55 


Scores on all tests were the number of 5-stroke 
“words” typed per minute minus the number of errors per 
minute, 


Note. 
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Results and Discussion 


As previously stated, there were ten 2-min. 
tests and a 6-min. final test given during Part 
I of the experiment and ten 2-min. tests and 
a 6-min. final test given during Part II of the 
experiment. As estimates of the reliability of 
these tests at three different points in train- 
ing, test-retest correlation coefficients were ob- 
tained. The correlation between scores for all 
Ss on Test 3 and Test 4 was .79. For scores 
of all Ss on the second and third 2-min. sec- 
tions of the Part I final examination, the cor- 
relation was .84, and for scores on the second 
and third 2-min. sections of the Part II final 
examination, the correlation was .81. The 
four test-retest coefficients for the electric and 
manual experienced and nonexperienced cate- 
gories at each of these three points in training 
were examined and judged sufficiently homo- 
genous to permit combining them into the 
above over-all coefficients. : 

Means of the scores on all tests for the two 
groups of Ss are presented in Table 1. These 
means are depicted graphically in Figure 1. 

As Figure 1 indicates, while the electric 
typewriter group showed a substantial decre- 
ment in performance on the first test after 
changing to manual typewriters (Test 11), 
there was considerable positive transfer of 
training from the initial test (Test 1). For 
measuring transfer, only the scores for the 20 
electric nontypists and the 20 manual non- 
typists are appropriate. Using the Gagne, 
Foster, and Crowley (1) formula, 


Experimental Group Mean on Test 11 
Control Group Mean on Test 11 


Control Group Mean on Test 1 


Table 2 
Analysis of Variance of Scores on the Part I and 


Part IT Final Examinations 


Source of 
Variation df 


Mean 
Square 


Part I 
12.50 1.07 
112.50 9.62*°* 
3.56 30 
144.50 12.35** 
14.22 1.22 
12.50 1.07 
49.50 4.27* 
11.70 


Typewritert 

Experience 

Instructor 

Class (T & I) 

TXE 

Ex! 

CXE 

Error 64 


Part Il 

53.39 

122.72 
6.72 

206.72 
24.50 
1.39 
34.72 
14.02 


Typewritert 

Experience 

Instructor 

Class (T & I) 

TXE 

Exl 

CcxsE 

Error 6A 


* Significant at .OS level of confidence F (1, 64; 0S $4.09 

** Significant at O01 level of confidence F (1, 64; .O1 7.038 

1 For the typewriter comparison the test is of the null hy 
pothesis: pp ww with an alternative hypothesis: we > wm 
= 2.79 
This is equivalent to 
a critical point for ¢ $1.67 and +2,.39 For the difference 
Xi Xu, the obtained ¢s for the Part I and Part II final 
examinations are +1.08 and 1.94, respectively Both of 
these t's require acceptance of the null hypothesis 


This is a I-tailed test for which the critical point is 
0 


for a = OS and I 5.71 for a 


with the applicable nontypist means, the per 
cent transfer was 


10.05 
13.20 


which equals 58°%. 

Differences between the Part I 
final examination and on the Part II final ex- 
amination were tested by analysis of variance 
techniques. For the analyses, Ss’ scores were 
partitioned by typewriter, experience, class 
and instructor—class and instructor being 
necessarily confounded. This confounding is 
unimportant for the purposes of the study, 
however, since only the typewriter compari- 
son is of primary interest. To obtain an 
equal number of cases in each category after 
losing one S through hospitalization, a S from 


scores on 


Control Group Mean on Test 1 100 
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each of the other seven categories, with the 
same rank on the test immediately preceding 
the hospitalization, was discarded, leaving nine 
Ss in each of the eight categories. The re- 
sults of this analysis are presented in Table 2. 

Table 2 shows that there were no practical 
differences between the electric and manual 
typewriter Ss on either final examination. 

It was concluded that, under the current 
operational conditions at the San Diego RM 
School, electric typewriters offer no advan- 
tages over manual typewriters for typing in- 
struction. It should be emphasized that in- 
ferences regarding typing instruction under 
different practical training conditions (e.g., 
where it is necessary to continue typing in- 
struction until students reach a higher pro- 
ficiency level) are not warranted on the basis 
of this experiment. 


Summary 


Electric and manual typewriters were com- 
pared as teaching devices under the current 
instructional operating conditions at the San 
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Diego Radioman School. Experimental and 
control groups were trained on electric and 
manual typewriters, respectively, with the ex- 
perimental groups switching to manual type- 
writers for the last fourth of the training. 
Typing proficiency was measured by a series 
of tests composed of cipher groups. It was 
found that students trained on manual type- 
writers performed as well as students trained 
on electric typewriters. There was consider- 


able positive transfer of training from electric 
to manual typewriters but direct practice on 
manual typewriters was preferable. 


Received September 20, 1956. 
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While the Leaderless Group Discussion 
(LGD) method for predicting future leader- 
ship ability has its roots in military selection 
(1), it has never found a widespread use in 
the U.S. Army. This is in spite of its excel- 
lent face validity and moderate-to-good cri- 
terion validity (2). The relative lack of 
economy of such a technique is probably the 
best reason for this lack of interest. For its 
moderate leadership predictive ability, the de- 
vice is rather demanding of time, physical fa- 
cilities, and trained observers. 

As Bass (2) points out, the basic and tra- 
ditional scheme of the LGD is to ask a group 
of examinees, as a group, to carry on a dis- 
cussion for a given period of time. No one is 
appointed as leader. Two or more examiners 
observe the interactions of the group and rate 
the leadership abilities of each member, usu- 
ally by means of a check list. The examiners 
do not participate in the discussion. The 
check lists usually include such items as ini- 
tiaiive, effectiveness in the presentation of 
ideas, and observable influence on the other 
members. The experimental design in LGD 
studies has varied in the number of raters, 
length of discussion time, and the directions 
given to the group. Seating arrangement and 
number of men in each LGD group have also 
varied in the different reported studies. As 
a result of a quantitative treatment of the 
observer check list, a score generally results 
which represents the individual member’s 
status in that particular LGD group. These 
scores are then correlated with various ex- 
ternal criteria of leadership in an attempt to 
validate the technique. 

The present study is concerned with an at- 
tempt to alter the traditional LGD technique 
in order to provide a more simple and eco- 
nomical use of this predictive tool. A modifi- 
cation is needed which permits the evaluation 


1 Formerly at Mental Hygiene Consultation Serv- 
ice, Fort Knox, Kentucky 


of a great many LGD groups at the same 
time. The object of this study is to deter- 
mine how the substitution of a buddy-rating 
technique for the presence of specially trained 
observers affects the correlation of LGD score 
with an external criterion of leadership. Such 
a technique resembles, in part, a method of 
leadership assessment developed by Bell and 


French (4). 


Experimental Method 


Basic training graduates at Fort Knox, Kentucky, 
comprise the investigated 10 classes of students at 
the Fort Knox Leaders Course. This population is 
selective in terms of intelligence and Army pertorm- 
ance. The students at Leaders Course are given 
theoretical and practical training in leadership meth- 
ods and techniques, as well as in allied subjects. At 
the end of an eight-week period, they are given a 
final leadership score which represents a sum of the 
four following measures: (a) faculty rating consist- 
ing of a personality trait-type rating and an over-all 
assessment of leadership ability, (b) a peer or buddy 
rating given by fellow leadership students, (c) a 
trait and leadership rating made on the basis of per- 
formance during a period of applicatory training, and 
(d) an OSS-type situational test in which the indi 
vidual is asked to perform in various military situa- 
tions requiring tactical knowledge and judgment 
The composite score constitutes the external criterion 
of leadership. In the present sample the contribu- 
tion of the various components to the composite cri- 
terion measure is as follows: faculty rating (40%), 
applicatory rating (26%), situational test (24%), 
and the buddy rating (19%) 

The use of a buddy rating in both the external va 
lidity criterion and in the predicting device reduces 
total variance. It should be stated, however, that 
the author operated within the confines of an al 
ready established training and rating system. The 
use of buddy rating as a component of final score in 
Leaders Course was satisfactory to the military, and 
as Wurster and Bass (5) point out, this is one of 
the most valuable predictors of leadership ability. 

On the afternoon of the first full day of their as- 
signment to Leaders Course, the class was taken to 
a large hall. The author assigned each member to a 
circle of chairs equipped with rigid built-in writing 
surfaces. These chairs were arranged in groups of 
eight (usually) in an approximate circle. Any two 
circles were separated by a distance of 20 to 30 ft. 
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Table 1 


Intercorrelations Between Components of the Final 
Leadership Criterion Measure 


Correla 


Components tion 


23** 
54** 
39"* 
25** 
a 
14** 


Faculty Rating vs. Applicatory Rating 
Faculty Rating vs. Buddy Rating 

Faculty Rating vs. Situational Test Rating 
Applicatory Kating vs. Buddy Rating 
Applicatory Rating vs. Situational Test Rating 
Buddy Rating vs. Situational Test Rating 


** Significant at the 1% level of confidence 


As each man entered the hall he was dispatched to 
a specific circle, every eighth man being assigned to 
the same circle or LGD (eight-man groups). Since 
the author was working within an ongoing, already 
established system, a perfect design (in this case 
equal numbers in all LGD groups) was not possible. 
Thus, in the total sample, 13 groups of nine subjects 
each, and 2 groups of seven subjects each were used. 
The remainder of the groups were composed of eight 
men 

The assembled groups were then told the purpose 
of the meeting by the author and were informed that 
this was a routine evaluation procedure of the school. 
The mass LGD period was begun by the author’s 
addressing the group as follows: 

“Part of your duties as future leaders will involve 
working in groups and arriving at solutions accept- 
able to the group. You are, as a group, to discuss 
the following topic and arrive at a group solution. 
One man is to be chosen by the group to write the 
solution. The topic of your discussion is as follows 
You are given the job of training men to be leaders 
What problems might arise, and how would you take 
care of these problems? I want you to work on this 
problem, as a group, and at the end of 50 minutes 
present a written solution which has been worked 
out by the group. Devote the last five minutes of 
the period to the writing of the solution.” 

After the discussion period was over and the writ 
ten solutions collected, the groups were told that 
part of their duties as future leaders would also in- 
volve the recognition of other leaders, and that this 
would often be demanded upon relatively short ac 
quaintance with those to be selected as leaders. They 
were then asked to rate the other members of their 
group from best through worst on the basis of lead- 
ership ability. 

Selection as best was given a score of seven, next 
best was scored six, and so on down to one point 
for poorest. The seven- and nine-man groups were 
adjusted to the eight-man score. The maximum 
score was then 49 (seven raters * seven points), and 
the poorest score was seven (seven raters * one 
point). 


Such instructions, procedure, and scoring differ 
considerably from those of the buddy-rating com- 
ponent of the final leadership criterion measure 
(given four weeks later). In the latter, detailed and 
relatively structured explanations of rating philoso- 
phy preceded the actual peer rating. Such explana- 
tions varied from group to group according to the 
inclinations of the instructors. Such groups varied 
in size from eight to fifteen men, while only the top 
three and the bottom three men were to be selected 
as representing above-average or below-average per- 
formance. Weightings were then assigned to each 
man on the basis of these ratings. Those men not 
mentioned were given points in between those given 
to the above-average and below-average students 


Results 


It is first noted that all correlations given 
in this paper are product-moment correlations. 
Table 1 suggests that those summed ratings 
which comprise the final leadership criterion 
reflect varying concepts of leadership quality. 
Thus, the components of the final score show 
a maximal intercorrelation of .54, and in the 
Buddy Rating vs. Situational Test Rating 
they display a correlation of only .14. Thus, 


prediction must be made within an obviously 
imperfect system, and predictive accuracy has 
been reduced in this study because of the un- 
reliability in the external validity criterion. 


Since no correction for attenuation due this 
unreliability is made in the following §sta- 
tistics, such correlations will be understate- 
ments of true relationships. 

Table 2 shows the correlations obtained for 


Table 2 
Correlations of LGD Status vs. Final Leadership 
Criterion Measure 
Class Correlation 
288 56 a" 
289 39 .39* 
290 48 49** 
292 50 20 
293 45 _ 
294 45 .00** 
295 49 .59** 
296 39 27 
297 40 Al* 
298 48 45** 


Pooled 459 44** 


_.* Significant at the 5% level of confidence. 
** Significant at the 1°% level of confidence. 
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Table 3 


Correlation of LGD Status vs. Buddy Rating 
Component of Final Leadership 
Criterion Measure 


Class N Correlation 


288 56 ol** 
289 39 24 
290 48 33° 
292 50 Ss 
293 45 
294 45 
295 49 62** 
296 39 A7** 
297 40 $1 
298 48 Saag 


66** 


| a 


Pooled 459 4R** 


* Significant at the 5% level of confidence. 
** Significant at the 1% level of confidence. 


the ten classes between LGD status (the 
forced distribution score) and the final lead- 
ership criterion measure. The pooled (for all 
ten classes) correlation between LGD status 
and the final leadership criterion score is .44. 
Mention should be made that 24 students 
were eliminated from the course chiefly be- 
cause of motivational failure. Analysis of the 
effect of such eliminations fails to reveal any 
substantial change in the present correlations. 
The figure of .44 compares favorably with 
those given in previous studies (2, 3, 5) con- 
sidering that no correction for attenuation has 
been made. 

Table 3 demonstrates the correlations ex- 
isting between the LGD status and the buddy- 
rating component of the final leadership cri- 
terion composite for all ten classes. The 
pooled correlation between these two assess- 
ments is seen to be .48. Such a relationship 
reflects the overlap contributed by having a 
similar assessment in both the predicting de- 
vice and the final criterion measure. 

Close inspection of the different correlations 
(by class) in Table 2 shows a marked varia- 
tion in these measures of predictive accuracy. 
A partial explanation for this variation is 
found in the composition of these groups. 
Due to administrative conditions over which 
the author had no control, four of the ten 
classes derived their population from single 


basic training companies. Thus, each of 
Classes 293, 294, 295, and 297 was com- 
posed of relative nonstrangers who had spent 
16 weeks of basic training together. It should 
be noted that there are approximately 250 
men in a basic training company, and there 
are varying degrees of social interaction which 
are dependent on platoon and squad assign- 
ment. The other six classes derived their men 
from two basic training companies each and 
were composed of a mixture of strangers and 
nonstrangers. The effects of such varying 
compositions are reflected in the correlations 
in both Tables 2 and 4. Table 2 shows that 
the nonstranger classes demonstrated gener- 
ally higher individual correlations. The mixed 
classes (stranger and nonstranger) show gen- 
erally lower correlations. Classes 288 and 
297 are exceptions. 

Table 4 gives the pooled correlations be 
tween LGD status and the final leadership 
criterion measure. This table shows the 
pooled correlations between LGD status and 
Leaders Course final score to be .48 for the 
nonstranger classes and .35 for the mixed 
classes of strangers and nonstrangers. It 
therefore seems reasonable to infer that in- 
fluences other than LGD performance oper- 
ate in nonstranger classes and that these pro- 
vide for better LGD status predictive ability. 

To ascertain the influence of prior experi- 
ence with LGD. members in other situations, 
an additional analysis of the data from the 
mixed classes was made. The author at- 
tempted to analyze these classes in terms of 
the predictive ability of LGD status based 
upon strangers rating strangers, and non 
strangers rating nonstrangers. This explora- 


Table 4 


Correlations of LGD Status vs. Final Leadership 
Criterion Measure in Classes of Stranger 
and Nonstranger Composition 


Group \ Correlation 


As** 
a 


Nonstrangert 179 
Mixed Stranger and Nonstranger} 240 


** Significant at the 1% level of confidence, 

1 Classes 293, 294, 295, and 297. Each clase derives‘its 
population from a single basic training company 
3 Classes 288, 289, 290, 292, 296, and 298 
rives its population from two separate bask 


Kach class de 
training companies 
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Table 5 


Correlations of LGD Status vs. Final Leadership 
Criterion Measure in Mixed Classes of 
Strangers and Nonstrangers 


Validity Correlations 


Non 

Class N Strangert — strangert 

288 56 39"* BT ad 

289 39 jl 26 

290 48 14 36" 

292 50 ao” jA9 

296 39 23 aa 

298 48 .39** 43** 
Pooled 280 .29°* kt bates 

* Significant at the 5% level of confidence. 

** Significant at the 1°) level of confidence 

?t Stranger indicates LGD status based on ratings by total 


strangers 

t Nonstranger indi ates 
nonstrangers, i.e., me 
Ing CoMipany 


LGD status based on ratings by 
mbers from the game original basic train 


tion required the expansion from 2 to 5 raters 
into a seven-man rating score. ‘Thus the LGD 
status of a subject was sometimes based on 
the judgments of only two raters, and the 
scores (from 2 to 14 points) were multiplied 
by a factor of 34. At other times the judg- 
ments would be made by five raters and only 
increased by a factor of 0.4. These estimates 
then do not derive from as many raters as in 
the correlations presented in Tables 2 and 4, 
and rater bias is thus exaggerated. Since 
this bias could operate in either a positive or 
a negative direction, and since such a bias is 
reflected in both stranger and nonstranger 
components within the mixed classes, such in- 
fluences are thought to cancel each other out. 

Table 5 demonstrates that raters who have 
served in the same basic training companies 
with the men whom they rate generally pre- 
dict the Leaders Course success of these men 
better than they do that of total strangers. 
This table shows the nonstranger-by-non- 
stranger rating in LGD to be correlated .37 
with the final leadership criterion measure. 
On the other hand, Table 5 demonstrates that 
LGD status based on ratings by total strang- 
ers has a lower predictive ability (r = .29). 
Thus, again it seems that prior knowledge of 
the ability of LGD members is moderately 
helpful in predicting the leadership talents of 
such members. 
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It is interesting to speculate on the differ- 
ences in relationship strengths between Tables 
4 and 5. If Table 4 ratings are as reliable as 
Table 5 ratings (and this is not at all cer- 
tain), then the effect of mixing strangers with 
an already established group results in lower- 
ing the leadership predictive ability among 
people who already have some acquaintance 
with one another. This speculation should be 
verified in some future study. 

Since information was readily available on 
intellectual capacity, it appeared that com- 
putations of the correlation between LGD 
status and Area Aptitude I scores might prove 
interesting. This latter test was one of a bat- 
tery of classification tests routinely given to 
men entering Army, and roughly approxi- 
mates a measure of general intelligence, al- 
though it was not specifically designed as 
such. Table 6 shows the pooled correlation 
between LGD status and Area Aptitude I 
score to be .30. This correlation corroborates 
the findings of Wurster and Bass (5) who cor- 
related LGD status with the Ohio State Psy- 
chological Examination and with the American 
Council on Education Psychological Examina- 
tion. Since in Army situations, intelligence 


Table 6 


Correlations of Intelligence vs. Final Leadership 
Criterion Measure and of Intelligence 
vs. LGD Status 


Intelligence 
vs. Final 
Leadership Intelligence 
Criterion vs. LGD 
Class Nt Measure Status 
288 56 18 Dg 
289 39 16 16 
290 48 22 sa 
292 50 16 25 
293 42 16 03 
294 44 .28 35° 
295 47 35* 34* 
296 39 29 mY hog 
297 39 A8** .29 
298 48 .76** se 
Pooled 451 mm” 30"* 





* Significant at the 5% level of confidence. 
** Significant at the 1% level of confidence. 
t These Ns differ partially from previous Ns since scores of 
intelligence, for various administrative reasons, were not avail 
able for eight subjects. 
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has proved to be one of the best predictors of 
leadership ability, correlations between Area 
Aptitude I scores and final scores in Leaders 
Course were computed. The pooled correla- 
tion between intelligence as reflected on the 
Area Aptitude I score and success in Leaders 
Course is .32. 

By way of recapitulation it should be noted 
that artificial restrictions were placed on the 
sample. The sample is selective in terms of 
the intelligence and performance ability of the 
included subjects. This probably results in 
an understatement of the various relation- 
ships. As Bass (2) points out, such restric- 
tions are likely to reduce the forecasting abil- 
ity of the LGD. In a wider sample, such as 
all graduates of basic training, the leadership 
predictive ability may well be considerably 
higher. 


Summary 


Basic training graduates assigned to the 
Fort Knox, Kentucky, leadership school par- 
ticipated in a leaderless group discussion ex- 
periment. Ten classes, ranging between 39 
and 56 men, were included in the study for 
a total of 459 subjects. The status in the 


leaderless discussion group for each subject 


was correlated with final leadership perform- 
ance in the school. Leaderless group status 
in this study, as opposed to the observer rat- 
ing system of previous experiments, was de- 
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termined by a forced distribution peer rating 
immediately following the leaderless discus- 
sion period. Correlation with validity cri- 
teria was .44 for a pooled group of 459 sub- 
jects. Analysis was also made of the effect 
of prior acquaintance with LGD members on 
LGD status. This was found to be influ- 
ential in improving the predictive ability of 
the LGD method. Analysis of the correla- 
tion of intelligence versus LGD status and in- 
telligence versus success in leadership school 
showed these to be .30 and .32, respectively. 
The method is recommended for future study 
as a simple, quantitative mass-selection tech- 
nique. 


Received Septe mber 24, 1956 
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Evaluation of a New Army Visual Testing Instrument ’ 


Julius M. Peters, Leon G. Goldstein, and Melvin R. Marks 
Personnel Research Branch, PRPD, The Adjutant General’s Office, Department of the Army 


Introduction 


Based on conclusions derived from a num- 
ber of comparative studies by the Personnel 
Research Branch, TAGO (4) and the Medi- 
cal Research Laboratory, BUMED (2, 3), 
the Armed Forces-National Research Coun- 
cil Vision Committee Working Group tenta- 
tively approved the use of an instrument ? for 
testing visual performance, including visual 
acuity, at a standard level of illumination. 
However, a number of the operating charac- 
teristics of this instrument (hereafter called 
Instrument A) were regarded as unsatisfac- 
tory. It was felt that a vision testing device 
suitable for use in a wide range of service 
situations should: 

1. Provide a means for monitoring and con- 
trolling current supplied for illumination. 

2. Be a compact and sturdy instrument. 

3. Feature a type of head rest easily 
adapted to the variations in head size and to 
differences in the physical size of examinees. 

4. Use a type of carrier which would allow 
easy introduction and withdrawal of vision 
targets with small probability of damage to 
the targets or the machine. 

Since Instrument A was deficient in these 
characteristics, personnel of the Surgeon Gen- 
eral’s Office sponsored the development of a 
new vision testing device; a model incorporat- 
ing their modifications was constructed (here- 
after referred to for convenience as Instru- 
ment B). Personnel of the Personnel Re- 
search Branch were asked to evaluate the 
newly developed Instrument B. 

The primary objectives of the evaluation 
were to determine: (a) The equivalence and 
relationship of Instrument B and the Stand- 
ard Wall Chart Visual Acuity Examination 
(WC) in terms of obtained test scores; and 


‘The opinions expressed are those of the authors 
and do not necessarily represent opinions of the De- 
partment of the Army. 

*This instrument is the Armed Forces Vision 
Tester which is a modification of the Ortho-Rater 
manufactured by the Bausch and Lomb Co. 
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(b) the reliability of the test scores obtained 
with Instrument B in terms of test-retest cor- 
relation of scores. The opportunity provided 
by this evaluation study was taken to also 
check the equivalence of Instrument B and 
Instrument A and the comparability of In- 
strument A to WC scores. 


Method 
Phases 


The data for this study were gathered in three 
phases—partly determined by the objectives of this 
study and partly by the availability of examinees 

In the first phase, the following counterbalanced 
order of testing was used to furnish information on 
comparability of the acuity test scores: 


B wc 

wc \ 

\ B 

B \ 

Wwe B A 
A we B 


In the second phase of the study, another group of 
examinees were administered visual acuity examina- 
tions with just Instrument B and the WC to gather 
data on the equivalence of Instrument B and WC 
scores without Instrument A contamination. 

In the final phase of the study, examinees in the 
first and second phases who took the Instrument B 
and then the WC visual acuity examinations were 
readministered the Instrument B visual acuity ex- 
amination. The retests were administered at least 
one week after the initial test; all testing was com- 
pleted approximately one month after the testing 
was begun. 


Sampling 


From the group of about 900 soldiers regularly sta- 
tioned at the Walter Reed Army Medical Center, 234 
were randomly selected to participate in this study. 
The men ranged in age from 19 to 58 years—mean 
age of 27.1 years; standard deviation, 6.5 years. All 
examinees who customarily wore glasses used them 
in the visual acuity examinations. 

The first 126 men selected at random were given 
the examinations in the first phase of the study so 
that data on 21 cases would be gathered for each of 
the six testing orders. One hundred and eight men 
were selected at random for the second phase of the 
study. Twenty-one men who took the Instrument 
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B and the WC in that order in the first phase were 
added to the 108 men of the second phase for re- 
testing with Instrument B in the third phase of the 
study. 


Apparatus 


The Armed Forces Standard Clinical Wall Chart 
and Test Alley. The WC presents nonserif letters in 
a series of lines ranging in terms of Snellen scores 
from 20/200 to 20/10. Specifications for WC are 
given in publications of the Personnel Research 
Branch, PRPD, TAGO, Department of the Army, 
on visual acuity (4). In the present study, the 
chart was placed at a distance of 20 ft. from the 
examinee in a test alley constructed according to 
standards recommended by the Armed Forces-NRC 
Vision Committee. The average level of illumination 
measured by a MacBeth Illuminometer was 10.56 
log micromicrolamberts (34.13 footlamberts). 

Instrument B. Instrument B presents test tar- 
gets at selected visual acuity levels trans-illuminated 
from behind. Essentially, it is an optical system 
which simulates the distance of 20 ft. at which the 
targets seem to be. The targets presented in the 
machine were made by photographic reduction of 
letters, like those used in the WC, to .0328 of actual 
size. The range of Snellen acuity levels is the same 
as that of the WC, except that there are more acuity 
gradations between 20/200 and 20/10, and a 20/400 
line of letters is added. However, only acuity levels 
common to both tests were used in the present study. 
The average level of illumination measured by a 
MacBeth [luminometer was 10.50 log micromicro- 
lamberts (29.09 footlamberts) . 

Instrument A. Instrument A, like Instrument B, 
is essentially an optical device used to simulate the 
distance of 20 ft. at which targets seem to be 
viewed. The test targets are trans-illuminated from 
behind. The targets used in the present study were 
the Armed Forces Vision Tester Plates. The plates 
have a Snellen acuity range from 20/200 to 20/12. 
There are more acuity gradations between the ex- 
tremes than in WC, but only those acuity levels com- 
mon to the WC and the Instrument A were com- 
pared. The average level of illumination measured 
by a MacBeth Illuminometer was 10.52 log micro- 
microlamberts (30.54 footlamberts). Additional speci- 
fications for the test plates used in Instrument A are 
given in the Instrument A manual (1) 


Testing Procedures 


Before the men were tested on the wall chart or 
machines, they were oriented as to the purpose of 
the testing program in which they were participat- 
ing. Each man to be tested filled out a personal data 
sheet giving information on current state of health, 
ocular condition, and rest. When the men were in 
proper position for testing, the examiners read stand- 
ard oral instructions to them. The test targets were 
observed monocularly. First the left eye and then 
the right eye was tested on each test. The examiner 
recorded all errors on appropriate answer sheets 


Scoring 


When the examinee made three consecutive errors 
on any one line, the examination for the eye being 
tested was halted. To meet the objective of the first 
phase of this study, only those lines at Snellen acuity 
levels common to the WC and the targets presented 
in Instrument B and Instrument A were scored. 
The last line in which 70% of the letters were read 
correctly was recorded in terms of visual angle, as 
the score for the eye examined. For the first two 
lines of the targets in which less than four letters are 
presented, it was required that all letters be read cor- 
rectly for score credit. For purposes of computing 
correlation coefficients from the data of the second 
and third phases, total raw score for each eye was 
used. The total raw score was the total number of 
letters correctly identified three 
errors 


before consecutive 


Results 
Equivalence of the Tests 
Frequency distributions and cumulative per- 
centage distributions of visual'angle scores on 
Table 1 


Cumulative Percentage Distributions of 
Visual Angle Scores 


(126 right eyes; 126 left eyes) 


Minutes of 
Visual 
Angle 


20.00 


10.00 


458 
4.5 


56 


56 
9 5 7 l 
6.3 5 79 
10.3 14.3 
71 1d 
16.3 18.3 254 
15.1 12 15.1 
41.4 W- 
$K.9 35.7 
93.7 OKA 100.0 
96.0 93.7 100.0 

100.0 100.0 a 

100.0 100.0 


57.9 


56.3 


*No test targets at cated acuity level 
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Table 2 
Equivalent Visual Angle Scores on Instrument B and 
Instrument A Using Wall Chart as the Standard 
(N = 252 Eyes: 126 Right Eyes plus 126 Left Eyes) 


Visual Angle Equivalents Snellen Equivalents 


Wwe Inst. Bs Inst. A WC Inst. B Inst.A 
10.00 10.00 Fle rae 

5000 3.3 nn co op 
350 278 oO ca a 
250 21 27 = a a 
200 «18 © 22 a a a 
150 145 1.65 = = * 
125 1.25 1.40 ea x a 
1.00 1.00 1.10 * = s 
ss a RRB 
i hee » 2 20 
0” 10 15 


WC, Instrument B, and Instrument A for the 
126 right eyes and the 126 left eyes examined 
in the first phase of the study are presented 
in Table 1. In general, the WC and the In- 
strument B test scores were reasonably com- 
parable; WC and Instrument A scores at the 
smaller visual angles were not as comparable. 

Using visual angle scores on the WC as the 
criterion and noting the cumulative percent- 
age of the group who achieved each of these 
scores, equivalent scores (in terms of visual 
angle and in terms of Snellen Units) on In- 
strument B and Instrument A were computed 
by equipercentile analysis. The equivalents 
are presented in Table 2. The WC and the 
Instrument B scores were almost exactly 


equivalent in the operationally crucial range 
from 20/30 to 20/10. 
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Table 3 


Correlation Between Wall Chart and 
Instrument B Raw Scores 


N r 
Right Eyes 127* 92 
Left Eyes 128 94 
All Eyes 255 95 


* One case in which eye was blind. 


Relationship Between WC and Instrument B 


Correlation coefficients between WC and 
Instrument B raw scores for right eyes, left 
eyes, and all eyes are shown in Table 3. The 
correlation between WC and Instrument B 
raw scores was high. 


Reliability of Instrument B Scores 


In Table 4, test-retest reliability coefficients 
for Instrument B scores are presented. The 
magnitudes of these coefficients indicate that 
the reliability of Instrument B scores was 
very high. 


Summary and Conclusions 


The objective of the present study was to 
evaluate a new device, Instrument B, devel- 
oped by personnel of the Surgeon General’s 
Office for testing photopic visual acuity. The 
evaluation was made by checking the equiva- 
lence and relationship of Instrument B and 
Standard Wall Chart Visual Acuity Examina- 
tion (WC) test scores and by determining the 
test-retest reliability of Instrument B scores. 
Opportunity was taken to also check the com- 
parability of Instrument A scores to Instru- 


Table 4 
Means, Standard Deviations, and Test-Retest 
Reliability Coefficients for Instru 
ment B Scores 


Test Retest 
N Mean SD Mean SD r 
Right Eyes 125 119.3 22.9 119.6 22.8 95 
Left Eyes 126 108.9 31.7 108.5 31.7 97 
All Eyes 251 114.1 28.2 114.0 28.2 97 
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ment B and WC scores. The study was made 
in three phases in which 234 men were given 
the three visual acuity tests, testing each eye 
separately, under photopic levels of illumina- 
tion. 

3 1. The results indicate that distribution of 
scores on far visual acuity examinations with 
Instrument B and the WC were reasonably 
comparable; more so than with Instrument A. 

2. The correlations between Instrument B 
scores and WC scores were in the .90’s. 

3. The test-retest reliability of scores ob- 
tained with Instrument B was in the high .90’s. 

4. On the basis of the results of this study, 
it was concluded that the test scores with 
Instrument B may be used interchangeably 
for those of the WC. Since Instrument B 


has advantageous characteristics of portabil- 
ity, compactness and sturdiness, it appears to 
be a more convenient means for testing visual 
acuity under standard conditions almost any- 
where than the relatively fixed location test 
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alley constructed in accordance with the 
standard specifications of the Armed Forces- 
National Research Council Vision Committee. 


Received October 31, 1956. 
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A Comparison of Counseled and Noncounseled Industrial 
School Students ' 


C. H. Patterson ” 


University of Illinois 


Previous papers (7, 8) have reported some 
of the results of a study of 1,011 students 
enrolling in an industrial institute. Of this 
group, 537 were nondisabled veterans study- 
ing under P.L. 550 (Korean G.I.). Since vo- 
cational counseling services are available on 
a voluntary basis to these veterans, the op- 
portunity was present to compare those who 
had availed themselves of such counseling 
with those who had not. 

A total of 69 were counseled by Veterans 
Administration counselors, most of them by 
counselors in one University Veterans Coun- 
seling Center. Counselors did not have avail- 
able the information on the relationship be- 
tween certain tests and success in the school. 
At the conclusion of counseling, the counselor 
was required to indicate whether he concurred 
or did not concur in the vocational training 
objective selected by the client. It was hoped 
that those approved might be compared with 
those whose choice was not approved. Only 
8 choices were not concurred in by the coun- 
selor, however, so these cases were discarded 
from the study. Actually, in many cases, the 
client did not reach a specific choice. In such 
cases, if the objective in which the veteran 
enrolled at the school was one which had been 
recommended or favorably considered by the 
counselor, or was closely related to specific 
objectives, or in an area approved by the 
counselor, concurrence was assumed. 


Description of the Sample 


The counseled and noncounseled groups are 
compared in Table | for age, education, num- 
ber of previous math, science, and shop courses 
taken, and scores on the tests administered on 


' Data included in this study were collected while 
the writer was employed as a counseling psycholo- 
gist, V. A. Regional Office, St. Paul, Minnesota. 
Opinions and conclusions expressed are those of the 
author and do not necessarily reflect those of the 
Veterans Administration. 

“I am indebted to Miss Joyce Wall for assistance 
in the computations. 


entrance to the school. Only the Kuder Me- 
chanical and Scientific scales are included in 
the comparison, since these were the only 
scales which were significantly related to per- 
sistence in training in the preliminary study. 
None of the differences between the counseled 
and noncounseled groups is significant; indi- 
vidual matching of cases was therefore not 
necessary. The combined veteran group is 
somewhat older, with more education, as com- 
pared to the total tested group (7). 


The Criteria 


Two criteria are used. The first is persist- 
ence in school. This criterion presumably in- 
cludes the influence of factors other than apti- 
tude and ability, such as interest, motivation, 
and personality. Such factors enter into the 
counselor’s approval or disapproval of a vo- 
cational objective. This criterion is also in- 
fluenced by factors beyond the student’s con- 
trol, however, such as illness, family illness 
or death, and financial difficulties. The last 
should not be a particularly important reason 
for leaving school in the case of veterans 
receiving government subsistence payments. 
The other fortuitous factors are relatively 
rare, even as reasons given by the student for 
leaving. This criterion has the advantage of 
including all the students in the sample. At 
the time of the follow-up, every student had 
an opportunity for at least 8 months of at- 
tendance. Persistence is thus in terms of 
number of months completed, up to a maxi- 
mum of eight; some, of course, had com- 
pleted more, up to the complete course of 18 
months. 

The second criterion is grades. Students 
are graded monthly in shop work, job knowl- 
edge, related subjects, and general subjects. 
For the purpose of this study all grades ob- 
tained by the student during his stay in 
school, through a maximum of eight months, 
were averaged. This criterion has the disad- 
vantages that the number of grades entering 
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Comparison of Industrial School Students 


Table 1 


Counseled and Nencounseled Students 


: AGCT 
N Age Educ. Score 
Counseled 69 23.57 


468 23.56 


11.40 
11.41 


95.10 
95.95 


44.25 
42.92 


Noncounseled 


Bennett RMPFB 


Descriptive Data 


No No No 
Shop Math Science 
Courses Courses Courses 


Kuder 
Mech.* 


Kuder 
oT 


46.48 
44.47 


96.53 
97.60 


62.08 
65.56 


2.25 
1.99 


1.91 
1.96 


1.30 
1.27 


* Ns for the Kuder are 47 for the counseled and 347 for the noncounseled; the Kuder was not administered to students enrolling 


during the latter part of the period covered by the study 


the average, and therefore the reliability of 
the average, varies, and students not complet- 
ing one month of work are not included since 
no grades were available for them. The two 
criteria no doubt overlap, but also supplement 
each other. It might be hypothesized that 
the criterion of persistence would be more ap- 
propriate as a test of the effect of counseling 
since it includes factors other than aptitude 
and ability. However, it is contaminated by 
fortuitous factors, which is not so likely to be 
the case with grades. 


Results 


Since the distribution of the number of 
months in school is highly skewed, a ¢ test of 
the difference between means is not appro- 


priate. Chi square was used to test the homo- 
geneity of the two distributions. Table 2 pre- 
sents the data. The null hypothesis is not 
rejected. There is no difference in persist- 
ence in training of the counseled and non- 
counseled students. 

Distributions of grades for the two groups 
were approximately normal. The variances 
were not significantly different. Where 1 is 
the top grade, and 5 the lowest, the average 
grade of the counseled students was 3.06, and 
the average grade of the noncounseled stu- 
dents was 2.91. The difference is not signifi- 
cant (t = 1.44). 

Discussion 

Negative results should require no explana- 
tion or interpretation. However, in dealing 
with groups of counseled and noncounseled 
subjects, we can never know whether they are 
equivalent in all respects (9). In the present 
study the two groups are equivalent in age, 
education, pertinent abilities and _ interests. 
Motivation and personality factors are not 
known. The counseled group may be jacking 


in certain respects in these areas, these lacks 
perhaps being responsible for their seeking 
counseling. Some have suggested that if 
counseled subjects are deficient in pertinent 
respects, then if they perform as well as the 
noncounseled, this is evidence of the value of 
counseling. 

There is evidence that counseled students 
underachieve in terms of their aptitude. As- 
sum and Levy (1) found that a group of 71 
college students who requested and received 
personal adjustment counseling, while not sig- 
nificantly different in aptitude from 71 con- 
trols, were inferior in achievement. In the 
study of Kaess and Long (5) the control 
group, which was not. matched for adjust- 
ment, aptitude, or achievement with the ex- 
perimental (counseled) group, while not dif- 
fering in aptitude, was superior, both prior 
to and following counseling, in achievement. 
Both groups showed gains in grades during 
the period under study, but the counseled did 
not improve more than the noncounseled. 

Other studies have failed to show signifi- 
cantly greater persistence in training or im- 


Table 2 


Persistence in Training of Counseled and 
Noncounseled Students 


Non 


Counseled counseled 


4 278 
22 
14 
17 
28 
24 
28 
28 
29 
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provement in grades following counseling. 
Ward, e.g. (10), comparing college freshmen 
groups of veterans matched for age, curricu- 
lum, and college aptitude score (Ns = 66), 
found no significant difference in grades. 
While on an adjustment scale there was no 
difference, Ward suggests that possibly the 
counseled group might have been less well 
adjusted prior to counseling. Blanchard also 
(3) found little difference between counseled 
and noncounseled groups of veterans in terms 
of persistence or success in training (Ns = 
285). He feels that the fact that 28% of 
the counseled group had no well-defined vo- 
cational interests prior to counseling may in- 
dicate that they were lacking in motivation 
compared to the noncounseled group. He 
states that “there is, of course, no means of 
determining how successful these counseled 
men would have been if they had entered 
training without benefit of advisement. Pos- 
sibly their record would have been less satis- 
factory than that of the noncounseled veter- 
ans in this sample” (3, p. 79). 

Some studies have reported significant effects 
of counseling (4). Leis (6) found that coun- 
seled veterans were superior to noncounseled 
in several respects, including persistence in 
training and grade point averages. However, 
the groups were not matched for significant 
factors. A number of deficiencies in the study 
may affect the results. Blackwell (2) also 
found that 40 students given client-centered 
vocational guidance achieved significantly bet- 
ter than 40 noncounseled students matched 
for scholastic aptitude and college class. She 
cautions that “it cannot be concluded that 
this significant difference is due to the guid- 
ance received. . . . This possibility is to be 
considered, however. A second possible cause 
of the significant difference could be attributed 
to the types of personality involved, assuming 
that in the Guidance Group were students who 
were alert, who had intellectual curiosity, and 
who were actively striving for improvement; 
while in the Control Group were students who 
were content to maintain the status quo.” 

So we have a situation where, when coun- 
seling appears not to be effective, it is sug- 
gested that the counseled group was inferior 
to the control group in some significant, but 
unmeasured, respects, while on the other hand, 


C. H. Patterson 





when a counseled group appears to be su- 
perior to the noncounseled, it is suggested that 
the counseled group was superior in some sig- 
nificant, but unmeasured, respects. Which is 
correct? We do not know. The only ways 
in which we can find out are (a) by ob- 
taining measurements of relevant personality 
and motivational factors for use in control 


or (b) randomly assigning applicants for 
counseling to counseled and noncounseled 
groups. The former method appears imprac- 


tical, since we do not have adequate meas- 
ures of relevant factors, and since one rele- 
vant factor is probably desire for counseling, 
which can only be equated by creating a con- 
trol group in which counseling is denied to 
some who desire it. 


Received November 29, 1956 
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The actual usefulness of a psychological test 
in predicting a given criterion of job success 
depends both on validity and the percentage 
of applicants who are selected. A test with 
relatively low validity, under conditions in 
which the selection ratio is low, may select 
more efficiently than does another test with 
relatively high validity but under conditions 
in which the selection ratio is high. 

Tables developed by Taylor and Russell 
(3) indicate the proportion of employees who 
will be satisfactory when selected on the basis 
of a test under varying conditions of validity 
and for varying selection ratios. This propor- 
tion is dependent on the proportion of present 
employees considered satisfactory. 

Smith (2) has suggested certain cautions in 
They 


the use of the Taylor-Russell tables. 
apply only in the case of an elliptical correla- 


tion scattergram. Since triangular scatter- 
grams are common, results derived from the 
tables may be misleading. 

Interpreting the usefulness of tests in pre- 
diction by the Taylor-Russell method involves 
(a) the computation of a coefficient of corre- 
lation between test scores and criterion scores, 
(b) access to the tables, and (c) the assump- 
tion (according to Smith frequently not valid) 
that the test scores and criterion scores yield 
an approximately normal bivariate distribu- 
tion. 


The Problem 


Many users of employment selection tests 
are not accomplished statisticians. Some of 
them may be able to compute correlation co- 
efficients; others may not. It is suggested 
that it would be advisable to provide such 
persons with a simple but adequate method 
of determining the usefulness of prediction 
devices. The method ideally should be as 


simple as possible to compute and should re 
sult in a figure that is meaningful to the per- 
son computing it and easily explained to 
others. 

If we may accept textbooks in industrial 
personnel psychology as indicators of teach- 
ing practice, we seem to be providing no 
means of evaluating test validity for the stu- 
dent who is not an accomplished statistician. 
Yet, in practice, selection tests are and no 
doubt will continue to be used by the sta- 
tistically unsophisticated. Welch, Stone, and 
Paterson (5) have provided a simple method 
for developing a weighted application blank. 
Their method involves no complicated  sta- 
tistical formulas. It involves the computa 
tion of the predictive efficiency of application 
blank information by means of a direct com- 
parison of successful and unsuccessful em- 
ployees in a try-out group. A somewhat simi 
lar method is here proposed for determining 
the predictive efficiency of selection tests. 


Method 


What one really wishes to know about a 
selection test is: How much more efficient is 
selection if we use this test than if we do not 
use it? This question can be answered by 
using a simple device. Administer the test 
and secure criterion measures for a try-out 
group. Select a cutting point on the test. 
Select a cutting point on the criterion. Count 
the number of cases falling above both cutting 
points—test and criterion. Compare this 
number with the number that would have 
been secured by random selection. 

If 50°% of employees are considered satis- 
factory and selection is made on a random 
basis or by using some method totally lacking 
in validity, then, regardless of the selection 
ratio, 50% of those we employ will be satis- 
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Ivan N. McCollom and David A. Savard 


Table 1 


Comparisons of Accuracy of Prediction from Taylor-Russell Tables and by Direct Method 


% Satisfactory Predicted 
From Try-Out Group 


(N = 128) 
Proportion Taylor-Russell 
Selection Satis Tables Direct 
Ratio factory Whenr = 48 Method 
A Kb Cc D 
20 50 77 75 
20 FO) 85 82 
20 70 91 89 
20 8O 96 93 
20 7) 99 100 
0 50 73 76 
30 ei) 81 80 
0 70 tet) 85 
JO 40 9O4 0 
AO 50 69 75 
MM A 78 82 
40 70 86 86 
40 80) 93 90 
50 50 66 67 
50 oO 75 77 
50 70 s4 80 
50 KO 91 86 
et) 50 63 66 
eo) A 73 77 
oO 70 82 80 
Ooo 80 9 86 
factory. If by selecting on the basis of the 


test we find that actually 70% of those se- 
lected prove satisfactory, we have improved 
selection. Seventy per cent satisfactory se- 
lection is 40% better than 50% satisfactory 
selection. The use of our test has improved 
selection by 40% over chance (or over the 
method previously used). Even though the 
previous method of selection had some con- 
siderable degree of validity, this method will 
indicate the percentage of improvement, if 
any, over previous selection.’ 

For practical purposes is this not all we 
need to know? Why try to teach a person, 


' Bellows and Rush (1, p. 51) express this as the 
formula, c = (b — a)/a, in which a is the percentage 
of satisfactory employees without the test, b is the 
percentage of satisfactory employees with the test 
used under conditions described, and c, then, is the 
percentage of improvement. 





% Actually 





Successful Differences Between 
Follow-up Follow-up Cc Cc D D 
Group 1 Group 2 and and and = and 
(N =90) (N = 86) E F E F 
E F G H I J 
53 67 24 10 22 x 
68 72 17 13 14 10 
79 83 12 8 10 6 
89 100 2 4 4 7 
89 100 10 1 11 0 
56 67 7 6 20 9 
67 70 14 11 13 10 
74 85 14 3 11 0 
81 96 13 2 9 6 
i) 65 9 4 15 10 
69 71 9 7 13 11 
74 82 12 4 12 4 
80 $8 13 5 10 2 
58 65 8 1 9 2 
6A 70 11 5 13 7 
76 81 8 E 4 1 
87 88 4 3 1 2 
62 58 1 5 q 8 
67 69 6 4 10 s 
76 76 6 6 4 4 
87 4 3 6 1 2 
Average 9.7 5.3 10.0 5.6 


himself statistically rather naive, how to ex- 
plain the significance of a correlation coeffi- 
cient? The use of the Taylor-Russell tables 
requires that a correlation coefficient be com- 
puted. To compute the correlation, complete 
data on a try-out group are necessary. Yet 
results can be obtained directly from the data 
without using the more arduous method of 
computing the correlation and inspecting the 
tables. 


Comparison with Taylor-Russell Method 


In practice, however, the predictive effi- 
ciency of the test obtained from the direct 
method commonly differs from that obtained 
from the Taylor-Russell tables. The differ- 
ences in results do not necessarily indicate 
that the direct method is the less accurate. It 


has already been mentioned that the Taylor- 








Effectiveness of Tests in Selection 


Russell method is based on an assumption of 
a normal bivariate surface. This assumption, 
in practice, is possibly—in fact quite prob- 
ably—not justified. 

The problem of the standard error of esti- 
mate is inherent in both methods. To deter- 
mine which method is the more adequate re- 
quires successive try-out groups on similar 
populations. Successive try-out groups are 
difficult to obtain in an actual industrial 
situation. Industrial management is not ordi- 
narily friendly toward the continued adminis- 
tration of a test to applicants without the use 
of the results in selection, particularly if it 
has already been demonstrated that the test 
has a useful level of predictive efficiency. 


First Experiment 


To secure a comparable situation to test the rela- 
tive adequacy of the Taylor-Russell method and the 
direct method, data obtained from students in ele- 
mentary psychology courses for three successive se- 
mesters were used. Scores on the Civilian Form of 
the Army General Classification Test were used as 
representative of a selection test and the total of 
the raw scores on five course examinations (the basis 
of the course grade) as the criterion. Enough data 
were available to provide for a try-out group and 
two successive follow-up groups. 

The results are shown in Table 1. Columns A and 
B indicate arbitrarily chosen conditions of selection 
ratio and proportion of the present group considered 
satisfactory, respectively. 

The try-out group consisted of 128 students. The 
coefficient of correlation obtained between the test 
and the criterion is r= 48. Using the Taylor-Rus- 
sell tables the percentages predicted to be satisfac- 
tory for the conditions given were obtained and are 
given in Column C. Column D indicates the per- 
centage satisfactory found by direct inspection of the 
data and used as a predictor. 
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The purpose of either method is to predict the per- 
centage satisfactory in future selection. Columns E 
and F indicate the actual percentages satisfactory in 
two follow-up groups under the same conditions 
The differences between predicted and obtained per- 
centages for each of the two methods are shown in 
Columns G, H, I, and J. 

The averages for Columns G and H indicate that 
the average errors in prediction by the Taylor-Rus- 
sell method for the two follow-up groups were 9.7 
and 5.3, respectively. The average errors of the di- 
rect method were 10.0 and 5.6. 

It is apparent that the simpler, direct method has 
predicted approximately as accurately more 
cumbersome Taylor-Russell method 


as the 


Further Experiments 


Data in as large numbers or on more than one 
follow-up group were not available from within an 
actual industrial context. However, sets of data, 
small in number of cases, were secured to provide 
further checks on the method. 

The Wonderlic Personnel Inventory was adminis- 
tered to a group of 34 aircraft workers. The coeffi- 
cient of correlation between test scores and a cri- 
terion of supervisor’s ratings was r= 57. The same 
test was administered to a follow-up group of 16 
workers in the same plant. Using selection ratios of 
30, 40, .50, and .60 and proportions satisfactory of 
50, .60, .70, and .80, the average error in prediction 
was 3.8 for the Taylor-Russell method and 5.7 for 
the direct method. These results are shown in 
Table 2. 

Through the courtesy of the authors, data were 
made available from two published sources.” Bel- 
lows and Rush (1) have published clerical aptitude 
test scores and criterion scores for 60 workers. In 
the present study the first 30 cases were treated as a 
try-out group and the last 30 cases as a follow-up 
group. The same selection ratios and proportions 
satisfactory were used as in the case of the aircraft 


‘The writers wish to express their appreciation to 
Dr. Roger M. Bellows and to Dr. Joseph Tiffin for 
permission to use data from their respective studies 


Table 2 


Comparison of Average Errors of Prediction for Taylor-Russell Method and Direct Method for Three Groups 


Try-Out Follow-up 
. Group Group 
Data Source N N 


Aircraft Co a 34 16 
Bellows (1) a 30 30 
Tiffin (4) di 23 23 


Average Error in 
Prediction 


Direct 
Method 


Taylor 
Russell 


Selection Instrument 


3.8 5.7 
90 2.2 
7.5 91 


Wonderlic Personnel Test 

Clerical Aptitude Test 

Bennett Test of Mech 
Comprehension 
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workers. The average errors in prediction are shown 
in Table 2. 

Table 2 aiso presents results using published data 
from Tiffin, similarly treated. Of 46 cases for which 
predictor and criterion data were given, the first 23 
were used as a try-out group and the remainder as 
a follow-up group. 


Discussion 


Several sets of try-out and follow-up data 
involving predictor and criterion scores have 
been presented. For each set of data predic- 
tions have been made of the percentage of 
workers, selected under varying conditions, 
who will be satisfactory. For making such 
predictions, two methods have been used, the 
Taylor-Russell method and a simpler direct 
method. The actual number of selected 
workers who were satisfactory was deter- 
mined for each situation with a follow-up 
group. Results predicted with the Taylor- 
Russell method and results predicted with the 
direct method were compared with actual re- 
sults. In some cases the Taylor-Russell 
method was more accurate, in some the di- 
rect method was more accurate. While the 


average errors of the Taylor-Russell method 


Ivan N. McCollom and David A. Savard 


were, on the whole, slightly less than the av- 
erage errors of the direct method, the differ- 
ences were small. 

It is concluded that the differences in re- 
sults obtained by the two methods are so 
small and so variable that the simple, direct 
method might well be substituted for the 
more difficult and cumbersome Taylor-Russell 
method of computing the effectiveness of tests 
in selection. 


Received June 26, 1956. 
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This study reports the effects of 55 to 70 
hours of sleep loss on the ability to receive 
and send complex instructions. It was car- 
ried out as part of a larger investigation of 
the physiology and psychology of sleep loss.” 
The over-all purpose of the project was to 
shed light on the complex process of fatigue 
by selecting tasks which involved functions of 
importance in real-life situations like combat, 
yet which might be expected to be impaired. 

Because communication is one function 
which is important in almost any kind of co- 
operative task situation, it was decided to de- 
sign a task which would simulate some of the 
most important aspects of communication: 
sending and receiving complex instructions 
accurately. For this purpose we adapted a 
task first used by Bavelas * and reported in a 
study by Leavitt and Mueller (1). As used 
by them, the task uses two Ss, one of whom 
has a pattern of ten domino-shaped pieces be- 
fore him. The other S also has ten pieces but 
is unable to see the correct pattern. It is the 
task of the first S to describe the pattern, 
piece by piece, in such a way that the second 
S is enabled to build it accurately. The de- 
gree to which one permits the second S to ask 
questions, etc., depends on the kind of com- 
munication one wishes to simulate. Because 
we wanted to simplify the situation as much 
as possible we chose as our model the “no 
feedback” situation in which the receiver is 
not permitted to say anything at all during 
the sending of the instructions. This choice 
permitted us to “standardize” the receiver 
when we were observing accuracy of sending 
by having S give his instructions into a re- 


1 Now at School of Industrial Management, Mas- 
sachusetts Institute of Technology 

* Results of other portions of this project will be 
published as Research Reports, Walter Reed Army 
Institute of Research, Washington, D. C. 

3 The use of this task was first described to me by 
Professor Bavelas in a seminar at the Massachusetts 
Institute of Technology. 


N 


corder, and to “standardize” the sender when 
we were observing the accuracy of receiving 
by having S listen to tape-recorded instruc- 
tions. 


Method 
Receiving Task 


The S sat at a table in front of a tape recorder 
with 10 domino-shaped pieces, 2 * 1 in. in size. Each 
problem consisted of a set of instructions which told 
S how to place the pieces, one at a time, so that 
eventually they would produce the correct pattern 
Such a sample pattern is shown in Fig. 1 (Ss worked 
with unnumbered pieces, however) 

In the instructions we always used the same “code” 
which described each piece as either horizontal or 
vertical, and specified its point of contact with some 
previously placed piece. For example, the instruc- 
tions to build Fig. 1 would be, “Place the first piece 
horizontally; place the second piece vertically with 
its top left corner touching the top right corner of 
the first piece; place the third piece horizontally 
with its top right corner touching the bottom left 
corner of the second piece; . place the eighth 
piece vertically with its bottom right corner touch 
ing the top right corner of the fifth piece; ” and 
so on, 

Whenever possible the instructions described the 
point of contact between a piece and the piece that 
had just been placed. However, in a number of 
problems “cut-backs” were deliberately introduced 
in which a piece had to be connected to some ear- 
lier piece in the sequence, as in the Piece 8 to Piece 
5 connection in Fig. 1. Since Ss never knew when 
such a cut-back would occur they had remain 
alert for the numbers of pieces at all times, and had 
to remember the order in which they had placed 
them. 

The instructions were paced so that it required ap- 
proximately 12 sec. to describe the location of any 
one piece. During any given testing session Ss were 
required to complete five problems of this type 

We also used a second receiving task in which pat 
terns consisted of 25 pieces. The task differed from 
the 10-piece one in that the instructions were given 
somewhat faster (10 sec. per piece) but there were 
no cut-backs. Each piece was connected to the one 
just placed, and Ss were informed of this condition 
before the problem began. During any given testing 
session Ss were required to complete one problem of 
this type. 


to 
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Sample problem. 











Fic. 1 

Sending Task 
The S was seated in front of a microphone. Using 
numbered pieces, the experimenter (Z£) built a pat- 
tern of the same type as used in the 10-piece receiv- 
ing problem and told S that he was to give instruc- 
tions to a hypothetical receiver who would listen to 
the record and attempt to build the pattern from S’s 
instructions. The Ss were told to use the same code 
for sending as had been used in the receiving task. 
In no case did the pattern require S to give instruc- 
tions involving a cut-back, although some pieces 
touched more than one previously placed piece, and 
in such a case S had the option of describing one of 
several possible connections. The Ss were told that 
this choice was their own. They were further told 
that they could go as fast as they wanted to, but 
that they would be scored on accuracy. They were 
also told that if they made an error or omitted some- 
thing they should correct it or fill in the necessary 
information, During any given testing session Ss 
were required to send five problems of 10 pieces each. 


Subjects 


The Ss were 20 volunteers from a medical research 
unit. This group was fulfilling its military obliga- 
tion by volunteering for medical research studies, 
and, therefore, was composed of very highly moti 
vated individuals 


Design of the Experiment 


The Ss lived on a hospital ward, 10 at a time. 
They were confined to this ward during the entire 
experiment but were permitted an otherwise normal! 
living routine. During the experimental period the 


experimental Ss were closely guarded to insure that 
they would remain awake at all times. 

Each of the two groups of 10 men went through 
two phases: during Phase 1 five men served as ex- 
perimentals and went without sleep while the other 
five served as controls staying on a normal sleep 
schedule; during Phase 2 these two groups of five 
men switched their roles. It was not possible to as- 
sign the tasks to different groups in a completely 
balanced manner. Instead, the exigencies of the 
situation necessitated the assignment of our two re- 
ceiving tasks and our one sending task according to 
the schedule shown in Table 1. Thus, 10 different 
experimentals and controls received the 10-piece re- 
ceiving task, while in the case of the 25-piece receiv- 
ing task and the sending task, the same group of 10 
men served as both experimentals and controls in 
successive phases 

Each phase of the experiment consisted of a con- 
trol period of three to five days, an experimental pe- 
riod of 72 hr., and a recovery period of three to five 
days. Each S was tested twice during the control 
period, twice during the experimental period (at ap- 
proximately 55 hr. and 70 hr. without sleep), and 
once during the recovery period. We constructed 
three sets of problems for sending and three sets for 
receiving, and readministered these in the fourth and 
subsequent testing sessions. No Ss ever gave any in- 
dication of recognizing any of the patterns, however. 


Analysis of the Data 


The Ss’ accuracy in receiving was recorded by E 
during the testing session as each successive piece was 
placed. Every piece was scored as correct or incor- 
rect in relation to the previous pieces. Thus, the 
maximum score per session for the five 10-piece 
problems was 50, and for the one 25-piece problem 


‘Table 1 
Experimental Schedule 


Phase 1 Phase 2 


Group A 


Ss 1-5 Exp Ss 1-5 Con. 

Ss 6-10 Con. Ss 6-10 Exp. 

Pilot testing Receiving test 
10 pieces) 


Group B 


Ss 11-15 Exp Ss 
Ss 16-20 Con. Ss 
Receiving test 
(10 pieces) 
Receiving test 
(25 pieces) 
Sending test 


11-15 Con. 
16-20 Exp. 


Receiving test 
(25 pieces) 
Sending test 





Sleep Deprivation in a Simulated Communication Task 
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Fic. 2. Accuracy of receiving in 10-piece problems 
was 25. The Ss’ accuracy of sending was scored 
from the transcripts of the records. If any portion 
of the instructions pertaining to a given piece was 
incorrect, the entire piece was considered incorrect 
The maximum score per session for the five 10-piece 
problems was 50 

The total time required by S to send each set of 
five problems was recorded during the test session 

In both receiving and sending, errors were classi- 
fied into three categories: (a) confusion of horizontal 
and vertical; (b) confusion of top and bottom or 
right and leit; and (c) confusion concerning the 
number of a piece to which the piece being placed 
had to be connected (eg., in Fig. 1, S might have 
connected Piece 8 to Piece 7 instead of to Piece 5). 
In the case of sending, errors were further catego- 
rized as being commission 


errors of omission or 
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Accuracy of receiving in 25-piece problems. 
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Fic. 4. Accuracy of sending 


(actual false instructions). The number of errors 
which Ss spontaneously corrected were also counted, 

In order to determine whether sleep deprivation 
effects were statistically significant, we obtained a 
control value for each S by averaging the scores of 
the control sessions just before and just after the 
deprivation sessions, and compared this to an ex- 
perimental value obtained by averaging the scores of 
the two experimental sessions. Since the control 
group was tested as many times as the experimental 
group, we compared changes in control group per- 
formances with the changes in the experimental 
group for statistical purposes. All statistical tests 
were ¢ tests, using the distribution of differences be- 
tween changes observed in the control and experi- 
mental groups. Thus for each task there is one ¢ 
value except where the same Ss participated as both 
experimentals and controls in successive phases, in 
which case separate values of ¢ were obtained for 
each phase 


Results 


The progressive decrement in accuracy of 
receiving and sending during the experimental 
period can be seen in Figs. 2, 3, and 4. Per- 
formance begins to decline at 55 hr. without 
sleep and reaches its low point at 70 hr. with- 
out sleep. In every case (except in Phase 1 
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of the 25-piece receiving problems) the re- 
covery score not only reaches but surpasses 
the control value of the pre-deprivation ses- 
sion, suggesting that Ss were continuing to 
learn despite their performante decrement. 

As can be seen in Table 2, the decrement 
in receiving is significant for both kinds of 
problems, but the decrement in sending is not 
significant. If one compares performance at 
70 hr. without sleep with performance in the 
control session, the relative decrement in re- 
ceiving is 11%, and in sending is 7% and 6%. 

The time required by Ss to send their in- 
structions shows a progressive increase during 
each experimental phase and a recovery score 
which surpasses the previous control value 
(Fig. 5). The increase is statistically signifi- 
cant, as can be seen in Table 2, Ss’ average 
time at 70 hr. being 26% and 33% longer 
than during the previous control periods. 

A similar rise occurred in the number of 
spontaneously corrected errors, but only the 
result for the second phase reached statistical 
significance. At 70 hr. of sleep loss, Ss were 
correcting 170% and 127% more errors than 
in the previous control periods. 

The analysis of receiving errors revealed 
that the most frequent error during the con- 
trol period involved mistakes in top-bottom 
or right-left instructions. Hardly any hori- 
zontal-vertical errors occurred. During the 
experimental period horizontal-vertical errors 
rose by only 35% (t = .84, not significant) 
while errors involving top-bottom or right- 
left increased by 150% (t = 1.77, not signifi- 
cant) and errors involving an incorrect con- 
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Table 2 
Table of ¢ Values 


Value Significance 


of t d Level 


Receiving Accuracy 
10-piece problems 2.80 2 
25-piece problems 
Phase 1 
Phase 2 


2.48 
3.30 
Sending Accuracy 

1.22 
1.72 


Phase 1 
Phase 2 
Sending Time 
Phase 1 
Phase 2 


5.70 
3.95 O1 


Corrected Sending Errors 
Phase 1 
Phase 2 


1.99 NS 
2.33 05 


necting piece increased by 250% 
significant at the .01 level). 

The analysis of sending errors showed that 
Ss made more errors of commission than of 
omission, and that both increased during sleep 
deprivation, but none of the relationships ap- 
proached statistical significance. Errors of 
omission tended to be primarily of the hori- 
zontal-vertical type, while errors of commis- 
sion tended to be primarily of the top-bottom, 
right-left type. All types of errors increased 
to an equivalent extent during the experi- 
mental sessions. 


(¢ = 2.95, 


Oualitative Observations 


Observation of Ss and inspection of the in- 
dividual scores revealed marked individual 
differences in both sending and receiving, 
ranging from no decrement to 40% decre- 
ment. A few Ss actually improved in ac- 
curacy when sleep-deprived, apparently be- 
cause they were better motivated when their 
ability to withstand stress was in question. 

Of some interest was the fact that perform- 
ance was more closely related to reported 
feelings of fatigue than how S looked. Usu- 
ally Ss who looked awake but said they were 
tired showed more decrement than Ss who 
looked sleepy but said they felt fine. 





Sleep Deprivation in a Simulated Communication Task 


In the sending task, Ss showed some altera- 
tion in their verbal behavior. When sleep-de- 
prived they slowed down noticeably, dropped 
the intensity of their voice, paused for long 
intervals without apparent reason, enunciated 
very poorly or mumbled instructions inau- 
dibly, mispronounced, slurred or ran words 
together, and repeated themselves or lost their 
place in the sequence of pieces despite the fact 
that they were numbered. 

The Ss varied in their preference for send- 
ing and receiving but their stated preference 
was not related to accuracy on the task. 


The Relation of Performance to Intelligence 


The mean intelligence level (Aptitude Area 
I score of the Army Classification Battery) of 
Ss was 118.6 with a standard deviation of 
16.0. In order to obtain some clue concern- 
ing the large individual differences, we corre- 
lated the Aptitude Area I scores with perform- 
ance during the control period and with the 
amount of decrement shown during the ex- 
perimental period. Control performance is 
correlated positively but not significantly with 
intelligence for both sending and receiving 
(rho = + .27, + 44). Most Ss performed al- 
most perfectly during the control period, thus 
curtailing the range of accuracy scores. The 
amount of decrement in accuracy is negatively 
correlated w:th intelligence in the case of re- 
ceiving (rho:= — .40, not significant); in the 
case of sendihg, however, a significant positive 
correlation is obtained (rho = + .80, signifi- 
cant at the .05 level). In other words, in the 
sending task, Ss with higher intelligence test 
scores showed greater decrement 


Discussion 


Both sending and receiving require a rela- 


tively rapid processing of information. How- 
ever, the high level of performance exhibited 
by all Ss during the control period indicates 
that the task is one that can be learned even 
by Ss of average or below average intelligence. 
Once the code has been mastered and over- 
learned, adequate performance depends pri- 
marily on a continuous output of attention 
and concentration. The decrement observed 
during the experimental period is very likely 


the product of a decreased ability of Ss to 
maintain the necessary level of attention and 
concentration, rather than the product of 
some alteration in their capacity to process 
the information or a decrease in motivation. 
The intra-individual differences observed dur- 
ing the experimental period and Ss’ own state- 
ments concerning their inability to concen- 
trate when sleepy or tired both argue in this 
direction. 

The fact that errors involving connections 
increased most would suggest that Ss have 
the most difficulty if they have to maintain 
a set for information that may or may not 
arrive. In the 10-piece receiving problems, 
Ss never knew when cut-backs would occur, 
yet they had to remain alert for them. In a 
number of cases Ss erred by thinking they 
had heard cut-back instructions when none 
had occurred. Almost all Ss stated that the 
25-piece problems were much easier because 
there were no cut-backs. 

The amount of decrement shown in the 
deprivation period is surprisingly small, even 
though statistically significant in the case of 
receiving. Even at 55 hr. some Ss were per- 
forming at almost their control level. 

The finding that receiving seems to be af- 
fected more than sending is suggestive, though 
it must be interpreted with caution. The 
tasks were not entirely comparable because 
Ss were allowed to select their own pace in 
sending whereas the pace was fixed in receiv- 
ing. The fact that sending time and the num- 
ber of corrected errors rose significantly dur- 
ing the experimental period suggests that Ss 
maintained their sending accuracy at the ex- 
pense of time and by correcting errors. In 
the receiving task there was no opportunity 
to correct errors because Ss could not replay 
any portion of the tape, and could not slow 
down the tape if it moved too fast for them. 

One might infer from this interpretation 
that a sleep-deprived S can perform ade- 
quately if given some choice in the pacing of 
his performance. The correlations with in- 
telligence cast some doubt on such an infer- 
ence, however. On the sending task it was 
the most intelligent men who showed the 
greatest decrement in accuracy. If self-pac- 
ing were an important factor one might ex- 
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pect the more intelligent individual to be more 
successful in compensating for his own fatigue 
by pacing himself. Instead, what appeared 
to happen, as one watched Ss, was that the 
more intelligent men viewed the task as being 
too easy, became overconfident, and attempted 
to go faster than their level of fatigue per- 
mitted; when they became aware of making 
errors they blocked, slowed up, corrected some 
of them, and then proceeded once again to go 
as fast as possible. The Ss whose perform- 
ance showed the least decrement and who were 
of average or below average intelligence never 
became overconfident and did not seem to be 
bothered by a_ self-imposed time pressure. 
During the experiment period they deliber- 
ately slowed down, thought more, and spoke 
with greater deliberation. 


Edgar H. Schein 


Summary 


Twenty Ss were tested for their ability to 
receive and send complex instructions in a 
simulated communication situation following 
55 and 70 hr. without sleep. The ability to 
receive showed a significant decrement, but 
the ability to send did not. The time re- 
quired to send instructions and the number 
of errors corrected spontaneously increased 
significantly. In the case of sending, high- 
intelligence Ss showed greater decrement than 
low-intelligence Ss. 


Received October 1, 1956. 
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One possible solution to overcrowded class- 
rooms is educational television. In general, 
studies of teaching by television have shown 
it to be successful. Wischner and Scheier (7) 
summarize research results in these words: 


What have we learned? A major conclusion war- 
ranted by all of the research findings is: TV can teach 
Within the range of subject matters and student 
groups investigated, TV groups generally learn as 
well as regular instruction groups In some in- 
stances TV groups achieve significantly better than 
their controls. With respect to retention measures, 
TV groups do as well as regularly instructed groups 
(7, p. 613). 


In addition to research relating to learning 
by television, some studies of short-term re- 
tention have been reported. Rock, Duva, and 
Murray (5), Kanner, Runyon, and Desiderato 
(3), Paul and Ogilvie (4), and Williams (6) 
all report results favorable to television. Of 
the retention of such material over a long pe- 
riod of time, however, little is known. Is ma- 
terial learned via television more ephemeral 
than that acquired in the classroom? 

The purpose of this study was to measure 
retention of material learned by various means 
of instruction, including television, after a 
time lapse of three years. 

The original study on which this follow-up 
was based was reported by Husband (2). He 
compared the effectiveness of teaching intro- 
ductory general psychology by television with 
three other teaching conditions studio 
i kinescope class, and two traditional 
campus Classes. All sections had the same in- 
structor, text and tests. The test grades at 
the end of the term showed little difference 
among the groups. 


a 
class, ¢ 


! Data from a thesis presented by Reba Patterson 
to the faculty of the graduate college of Iowa State 
College in partial fulfillment of requirements for the 
degree of Master of Science, 1956 
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Procedure 


In the original study, Husband used 302 subjects 
Fifty-four of these took the course by television at 
home and 248 were enrolled in campus classes 

A total of 83 subjects was retested. Forty of the 
54 television students were located in the state and 
retested. It appeared that this was representative of 
the original group. Forty-three of the campus stu- 
dents were retested. This constituted about 20% of 
the original total. Twenty-two of the original 145 
classroom students, 11 the 50 TV-in-studio sub- 
jects, and 10 of the original 53 kinescope students 
were retested 

During the three-year time lapse, many of the 
campus students who took part in the original study 
left college as graduates, transfers, or drop-outs. To 
be sure that students available for testing were not 
a biased sample of the original group, differences be- 
tween the mean ACE and the mean high 
school averages of tested and unavailable subjects 
were analyzed by ¢ tests. These differences were not 
significant at the 5% level 

The instrument used for retesting was the original 
final examination It of 125 multiple 
choice items which covered all topics taught during 
the term 

Statistical significance of the difference 
retention among the four groups 
classroom, and kinescope 


ol 


scores 


consisted 


in rate of 

television, studio, 
tested Since this 
analysis involves repeated measurements on the same 
subjects, this was taken into account 
of variation (1, Ch. 15) 
correlated obviated 


was 


and controlled 
The problem 
by using dif 
The significance of the mean differ 
ence between groups, of the mean difference between 
the 1953 and the 1956 test scores, and of the inter 
action and groups tested using 
the F statistic from the appropriate analysis of vari 
To study 


as a source 
of 


ference 


measures Was 


scores 


between tests were 


ance the differences in subject retention 
courses studied 
the same kind of analysis was repeated 


as influenced by psychology since, 
The possi 
bility of a relationship between level of initial score 
and amount of material retained was investigated by 
testing the of the 


group mean differences 


significance difference between 


Results 
Retention Influenced by Teaching Method 


To determine individual retention, each sub- 
ject’s total of correct responses on his retest 
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Table 1 


Mean Scores and Differences in Retention 
According to Teaching Method 


Mean Mean 
Score Score Mean 
Group n 1953 1956 Differences 
Television 40 88.5 75.6 —12.9 
Classroom 22 87.2 75.6 “11.6 
TV-in-studio 11 83.0 72.7 ~—10.3 
Kinescope 10 90.4 76.0 ~14.4 
Total 83 


was compared with his original score on the 
examination taken in 1953. The amount for- 
gotten is expressed in terms of the difference 
between the two scores. The results are 
shown in Table 1. Three analyses were made. 
An F test showed that the differences be- 
tween groups were not significant at the 5% 
level, nor was the interaction between test and 
group significant. The third test, on the dif- 
ference in performance on the 1953 test and 
the 1956 test, was highly significant. 


Retention Influenced by Study Since 1953 


It semed possible that differences among 
campus students might have been influenced 
by the number of psychology courses studied 
since 1953. Accordingly, students were di- 
vided into four groups—those with no psy- 
chology courses since the original, those with 
one, those with two, and those with three or 
more. (The highest number of courses any 
student had taken was seven.) The results 
of this division are shown in Table 2. 

An F test showed no significant difference 
between these four groups at the 5°% level. 
Again the difference in performance on the 
1953 test and the 1956 test was highly sig- 
nificant. The interaction between tests and 
groups was significant at the 5% level. For 
the first three groups the interaction was 
about the same. For the fourth group-—those 
students who had taken three or more psy- 
chology courses—the loss over three years as 
seen in the mean difference was about half 
that of the other three groups. Despite the 
difference in loss, the fourth group still scored 
lowest on both tests. 


Reba Patterson Benschoter and Don C. Charles 


Table 2 
Mean Scores and Differences in Retention According to j 
Number of Psychology Courses Since 1953 


Mean Mean 
Score Score Mean 
Group n 1953 1956 Differences 
No Courses 19 87.1 74.3 —12.8 
One Course 7 89.6 77.3 —12.3 
Two Courses 11 90.1 77.5 —12.6 
Three or More 
Courses 6 77.3 69.7 7.6 
Total 43 





The non-campus, television students were 
divided into two groups—those who had done 
some academic work since the original courses, 
and those who had done none. A nonsignifi- 
cant difference of less than two points ap- 
peared between these groups. 


Retention Influenced by Original Score 


All the subjects, regardless of teaching 
method, were grouped according to their origi- 
nal scores on the test. Experimental findings 
in this area of learning generally indicate that 
high scorers on tests over meaningful mate- 
rial retain more than low scorers. 

After dividing the subjects into three 
groups, a ¢ test was made to compare the 
low-scoring group (original test score 60-80) 
and the middle group (original score 81-100). 
Results showed that the low-score group had 
lost least and the difference was significant at 
the 1% level. These findings are summed up 
in Table 3. 


Table 3 


Mean Scores and Differences in Retention 
According to Original Score 


Mean Mean 
Score Score Mean 
Group n 1953 1956 Differences 
Score 60-80 17 73.00 70.47 - 2.53* 
Score 81-100 56 89.04 74.98 — 14.06 
Score 101-120 10 104.80 85.50 —19.30 


Total &3 


* Student's ¢ comparing lowest and middle range groups = 
4.87, significant at 1°; 
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Because the lowest group was very small, 
another approach was tried. All 83 subjects 
were divided at the median, and the test-re- 
test means of the upper and lower halves were 
analyzed. Again the difference between the 
mean differences was significant at the 1% 
level. Those with lower original scores had 
lost less. 


Discussion 


The results of this study offer more sup- 
port for television as an effective instructional 
instrument. However, further examination of 
some aspects of this and similar studies is 
warranted. 

One question which should be raised has to 
do with the measuring instrument used. What 
exactly has been measured? The test items 
used in this study seem to measure the amount 
of factual material acquired from text and 
lecture rather than general understanding and 
application of principles. Certainly the ac- 
quisition of factual knowledge is essential and 
objective tests are ideally suited to the meas- 
urement of such material. We do not know, 
however, whether the use of another kind 
of test—essay, oral, situational—would have 
given different results, or whether a different 
emphasis in test content would have made 
changes. It can only be said that the test 
was typical of those widely used to measure 
achievement in courses of this kind. 

A problem in all longitudinal studies is sam 
ple shrinkage. In this study, differences be- 
tween tested and unavailable subjects were 
statistically insignificant for the factors 
measured. The small size of some of the 
groups should be considered, however, when 
drawing conclusions. 

The fact that subjects with lower original 
scores lost less than original high scorers con- 
tradicts the principle that amount of reten- 
tion is positively related to amount of learn- 
ing. One possible explanation is regression 
toward the mean; high scorers have more to 
lose than low scorers. Another possibility is 
that because of the factual nature of the test, 
some students may have benefitted from 
“cramming” for the 1953 final but, lacking 
understanding of principles, may have for- 


gotten quickly. It is questionable whether 
these explanations account fully for the dif- 
ferences in loss shown in Table 3. 
pect of long-term retention needs 
study. 

It should be noted that not all individuals 
showed a loss in score on retest, even though 
there were mean losses for all groups. The 
changes in test scores ranged from + 17 to 
— 37. Most of the increases in test scores 
occurred in those subjects whose original 
scores were below the median. 


This as- 
further 


Summary and Conclusions 


This study, based on Husband's experiment 
of 1953, attempted to analyze three-year re- 
tention of psychology subject matter taught 
by television and by traditional methods. A 
readministration of the final examination 
given in 1953 was compared to original scores 
to give a measure of retention. 
were as follows: 


The results 


1. A comparison of the mean differences in 
retention among the four groups—television, 
classroom, TV-in-studio, and kinescope— 
showed that there was no significant differ- 
ence in amount remembered by these groups. 

2. When campus students were categorized 
according to the number of psychology courses 
taken since 1953, and their differences in re- 
tention analyzed, it was found that there was 
a significant interaction here. with 
three or more courses retained more than 
those students who had taken no, one, or two 
psychology courses. However, the group with 
the most courses scored lowest on both the 
1953 and 1956 tests. When television stu- 
dents were classified on the basis of whether 
they had or had not done any formal studying 
since 1953 it was found that there was little 
or no difference in retention between the two 
groups. 


Those 


3. When the retention of all the subjects 
was considered on the basis of their original 
scores, it was found that those with the low- 
est original scores lost less than higher scorers. 

The results of this study indicate that, 
within the limitations and restrictions pointed 
out in the Discussion, long-term retention of 
academic material learned by television in- 


« 
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Some modern data-processing systems re- 
quire an operator to read out items in se- 
quence from a memory or store under pres- 
sure of speed and accuracy. Such tasks could 
be designed to be self-paced, rate-paced, or, 
as proposed in this study, self-paced by groups 
of items. 

In a self-paced task, timing of the signal 
presentation is controlled by the operator's 
individual rate of responding; he is given a 
new signal only after he has responded to the 
signal before him. Sorting reference cards 
into a card file is an example of a familiar 
self-paced task. 

On the other hand, it is characteristic of 
rate-paced tasks that the operator has little 
or no control over the timing of the signals, 
but must respond so as to keep up with an 
externally determined rate of presentation. 
Tracking on a PPI is a rate-paced task in that 
the scanning speed of the antenna fixes the 
rate at which targets are presented. 

In this experiment, a combination of these 
two modes of operation was used. <A _ short 
sequence of signals was presented at a pre- 
determined rate. After observing the sequence 
of signals, the operator was required to re- 
spond to each item of the sequence. When 
he had completed his responses to the group 
of signals, he was then given the next se- 
quence. Hence, the signal presentation was 
rate-paced by items within the group, but the 
rate at which the groups or subsequences ap- 
peared was determined by the operator's in- 
dividual rate of responding, i.e., the task was 
self-paced by groups. 

Information is available comparing rate- 
paced and self-paced tasks (3) when signals 
are presented singly, but very little experi- 
mental data is available on tasks requiring 
rapid responding to groups of signals (2). In 
general, with single items, self-pacing permits 
better performance than rate-pacing. The 

1 Now at the Advanced Electronics Center, 
eral Electric Company, Ithaca, New York 
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problem for this study was to compare sev- 
eral conditions of responding to groups of 
rate-paced signals with a control condition of 
self-paced single-item presentation. There is 
some reason to believe that systems which 
combine forced-signal presentation with op- 
portunity for intermittent operation may re- 
sult in superior performance, since the op- 
erator is able to organize his responses into 
short, integrated patterns (5, 6). The cir- 
cumstances under which reduction coding 
may be used to advantage are largely unde- 
fined, and this study represents only a_be- 
ginning at demonstrating the effects of a few 
stimulus presentation parameters. 


Method 


A 5 * 5 matrix of display light mounted in front 
of and above a similar push-button control panel 
was selected as the experimental system. Previous 
work had shown all items to be of nearly equal diffi 
culty, and it was thought that the spatial properties 
of this arrangement would allow greater opportunity 
for integrating sequences into patterns 

Six naval enlisted men served as Ss, although one 
man did so poorly on all grouped conditions that in 
clusion of his data was not warranted. The Ss were 
instructed to wait until all lights in a group had been 
presented and then to respond as quickly and as ac 
curately as possible, preserving order if they could, 
but guessing if necessary An experimental session 
consisted of a block of four trials with a fixed num 
The rate of presentation 
was different for each trial within the block. Total 
time and error scores were taken and plotted fol 
lowing each trial so the Ss could observe their prog 
ress in comparison with one another 

A trial consisted of 60 signals 1 sec. in duration, 
presented in groups of three, four items 
(hereafter these conditions are designated by I, II, 
and IV, respectively). The 
were .47, 52, .68, or 1.02 sec. One trial per day was 
of 12 conditions for 1% days 
The orders of presenting the conditions were sys 
tematically varied for each S each day. A self 
paced control trial was given preceding and follow- 
ing each day’s work. The 
trials were the 
parisons 

An automatic programmer (4) generated eight dif 
ferent random runs and grouped and paced the 60 


ber of items per group 


two, or 


interstimulus intervals 


given on each these 


mean measures of these 


used as base for subsequent com 
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signals according to the requirements of the experi- 
mental design. The time at which each signal or re- 
sponse occurred along with its identity could be re- 
corded, although these detailed data were taken only 
on the last three trials. For Trials 1 through 15, 
total time and errors (not including errors in order) 
were the only measures taken. 


Results 


In presenting the results of this study, 
measures relating to the total sets of 60 sig- 
nals are considered first. These measures are 
the total times, errors, and rates of informa- 
tion transmission. An analysis of the re- 
sponses to the individual groups of items is 
also given. As stated above, the results are 
based on only five of the original six Ss. 

Total time measures. Figure 1 shows the 
average time required to transmit 60 signals 
for each condition on successive trials. The 
course of learning seems about the same for 
all conditions. The data for the last three 
criterion trials appear reasonably stable as 
shown by tests for trend (7) over Trials 16, 
17, and 18 which proved significant only for 
the IV conditions, combined, and for II[-.68 
individually (p = .05). 

The curves in Fig. 2 demonstrate that to- 
tal time is directly related to the number of 
signals per group and the interval between 
signals. 

Table 1 shows that the increases in total 
time mainly reflect the increases in time taken 
to present grouped stimuli; the responding or 
“punch-out” times remain fairly constant. It 
can be seen that the time consumed in pre- 
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senting grouped signals is substantially greater 
in every case than that taken up in present- 
ing the single self-paced signals. However, 
even allowing for the extra time used to pre- 
sent signals, the responses were organized so 
as to effect a significant over-all savings in 
time for condition II-.37 (7) (p < .01) and 
no loss in total time for III—.37. All the 
other conditions were slower than the self- 
paced condition. In fact, for condition IV— 
1.02, the time taken up by signal presentation 
was nearly as great as the average total time 
for the self-paced control. 

Error measures. Complementary to the to- 
tal time measures are the error counts, which 
were obtained from the detailed records of the 
last three trials and which include inversions 
in order as well as extraneous responses. The 
percentage of responses in error for each con- 
dition, based on pooled data for the five Ss 
on the last three trials, is shown in Fig. 3. 
It can be seen that the error level is quite low 
even in the worst condition. Errors tend to 


increase more rapidly as a function of the size 
of the group than as a function of the speed 
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of signal presentation. Conditions II-.37 and 
III-.37, which compare favorably with the 
self-paced condition in terms of total time 
measures, are inferior to it in terms of the 
error counts. 

Information measures. The conditions are 
compared in Fig. 4, using the average rate of 
transmision of information as a measure which 
takes into account both time and error fac- 
tors. In computing transmission rates, data 
were pooled over trials for each S and then 
averaged over Ss. Each signal was consid- 
ered as an independent message, i.e., the in- 
formation contributed by the patterning of 
the signals was ignored. It was also as- 
sumed that when an error occurred, any of 
the remaining 24 possible responses were 
equally likely. These restrictions result in an 
underestimation of the transmission rates; 
the values computed can only be used for 
rough comparisons not amenable to statisti- 
cal tests of significance. 

The average rate for IIl—.37 was 5.32 bits 
per sec. and is the only value greater than 
that for the self-paced conditions, 5.19 bits 
per sec. This indicates that as far as infor- 
mation measures are concerned, the increase 
in responding rate more than compensated 


Table 1 


Total Time Components 
(Seconds) 


Mean 
Total 
Time Time to 
per 60 Present “Punch-Out” 
Condition Stimuli Stimuli Time 
SP 53.1 6.0 > 47.1 
Il 37 50.6 17.1 33.5 
Br 55.6 21.6 34.0 
68 OA 264 34.0 
1.02 67.1 36.6 30.5 
III 37 53.0 20.6 32.4 
52 59.8 26.8 33.0 
.68 67.2 32.2 35.0 
1.02 77.2 46.8 304 
IV 37 59.0 22.6 36.4 
52 640 27.9 36.1 
68 74.0 36.6 37.4 
1.02 89.2 51.9 37.3 
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for the errors in the case of II-.37. Due to 
the underestimation involved in computing 
the transmission rates, the other cases are 
ambiguous; they may be either equivalent or 
inferior to the self-paced condition. 

Response sequence measures. Each se- 
quence of responses to the groups of signals 
was analyzed into (a) an initial delay pe- 
riod, RT, taken as the interval between the 
onset of the last signal and the initiation of 
the first response, and (4) the interresponse 
intervals, IRI,, IRI., and IRI,, measured be- 
tween the onset times of successive responses. 
Table 2 summarizes the data on these com- 
ponents in terms of means based on five Ss 
over the last three trials.” 

The only variable affecting the RT meas- 
ures is the number of items per group. As 
more signals are presented, the initial delay 
period increases. No significant changes are 
attributable to rates of signal presentation 
used in this study. The RT values are con- 
sistently lowest for the slowest rate of pres- 

“For detailed analysis of variance treatment of 
these data see: Knowles, W. B., and Newlin, E. P 


Coding by Groups as a Mode of Stimulus Presenta- 
tion, NRL Report No. 4604, 1955. 
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entation, but this is due mainly to one S 
who responded differentially to this rate. 

The relations among the IRI measures are 
somewhat more complex, although similar to 
the RT results. The general picture is that 
as the number of items per group increases, 
the intervals between responses lengthen, 
while rate of presentation remains a rela- 
tively unimportant variable. However, in the 
IV conditions, longer interresponse intervals 
are associated to a significant degree with 
slower rates of signal presentation. With the 
number of items fixed, the rate of responding 
is constant, so that for conditions with three 
items per group IRI, and IRI, are equiva- 
lent, and for the IV conditions IRI,, IRI», 
and IRI, are essentially equal across the vari- 
ous rates of signal presentation. 


Discussion 


The two major points are demonstrated by 
this experiment. First, most of the grouped 
conditions resulted in longer total times than 
the self-paced, single-item mode. The values 
for Il-.37 show, however, that “recording by 
groups,” even though it requires the operator 
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Table 2 


Mean Value of Response Components in Seconds 


Inter 
Stimulus 
Group Interval RI IRI, IRI, IRI; 
II 37 84 34 
52 83 36 
N=400 08 BS 36 
1.02 72 35 
Il 37 1.00 39 45 
52 1.00 Al 40 
N= 300 68 98 Al 42 
1.02 bate! 41 42 
IV 37 1.26 47 AS 45 
52 1.23 49 48 AS 
N =225 68 1.21 52 | 50 
1.02 1.15 52 54 49 


to “store” signals and wastes time in signal 
presentation, can still result in a more effi- 
cient system than a self-paced, item-by-item 
system if a small number of signals is pre- 
sented at a sufficiently high rate. Second, 
over the range of conditions in which this 
mode of operation might be used to advan- 
tage, the number of items or amount of in- 
formation presented is more important in de- 
termining the organization of the ;esponses 
than is the rate of presentation. 

A number of considerations should be taken 
into account in evaluating these conclusions. 
For one thing, it is doubtful that the shortest 
interstimulus interval used in this study, .37 
sec., represents the optimum pacing value; as 
a guess, an interval of about .20 sec. would 
probably have resulted in the greatest savings 
in total time. 

It is also important to realize that the dis- 
play-control relationships pertaining in this 
study were specifically chosen to permit the 
use of “spatial coding,” i.e., the system was 
such that the signals could be identified as 
points in the display space and the buttons 
could be located as corresponding points in 
the control space (1). Under these circum- 
stances, the Ss reported that they were able 
to pattern the signals into “lines” with two 
lights and “triangles” with three lights. Sys- 
tems may be designed in such a manner that 








Reduction Coding 


the operator is forced to use what has been 
called “verbal coding,” i.e., identification of 
the signals and responses in terms of names, 
letter and number tags, or other symbolic rep- 
resentation (1). It is doubtful whether the 
relationships found in the present study would 
hold for systems in which the operator must 
employ such verbal codes although, if the ver- 
bal material were carefully designed and pre- 
sented auditorily, considerable savings through 
reduction coding should be possible. This is 
an issue that could be considered in another 
study. 

Several findings suggest other leads for fur- 
ther study into the ways in which coding by 
groups can be made to be more effective. For 
example, the learning curves in Fig. 1 sug- 
gest that with continued practice, more of the 
experimental conditions may have surpassed 
the self-paced condition. It may be that the 
stability of the data over the three criterion 
trials represented a plateau or temporary 
steady state. Another lead is provided by 
analyses which show that there were very 
pronounced differences between Ss, both in 
over-all performance levels and in differential 
performances on the various conditions. This 


suggests that some Ss adopted more efficient 
methods of handling the grouped signals; per- 
haps with instruction and coaching, much of 
this intersubject variability could be reduced. 

A few observations which seem to be re- 
lated to the operator’s capacity for storing 


information were noted. During preliminary 
trials, it was found almost impossible to han- 
dle as many as five items per group. In the 
present situation, a group of five signals rep- 
resents an input of about 24 bits. It is inter- 
esting to compare this figure to the findings 
in classical studies on immediate memory 
span (8, p. 703) where the average number 
of digits that can be repeated accurately is 
about seven or eight, corresponding to the 
transmission of from 20 to 25 bits. During 
the experiment, Ss reported particular diffi- 
culty with groups of even four items. The 
patterns they tried to form out of the sig- 
nals “kept getting messed up.” This diffi- 
culty apparently was not experienced in 
handling two or three items. 

Again, in relation to the difficulty experi- 
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enced in handling the groups of four items, it 
was found that rate of signal presentation en- 
ters as a variable in a rather peculiar way: 
The intervals between responses lengthened as 
the rate of signal presentation became slower. 
It is tempting to speculate that the relatively 
long sequence of signals presented at regular 
intervals carried with them a rhythm pattern 
which influenced the responding rates. No 
evidence for this kind of effect was seen in 
any of the other conditions. 


Summary 


Using a 5 © 5 matrix of lights and a simi- 
lar panel of push buttons, five Ss were re 
quired to respond to sequences of 60 signals 
presented in sub-sequences of two, three, or 
four signals per group with intervals of .37, 
52, .68, or 1.02 sec. between items. Per 
formances on these 12 conditions were com- 
pared with a self-paced control condition in 
terms of speed and accuracy measures. The 
following results were obtained: 

1. With the fastest rate of stimulus pres 
entation, conditions with two and three items 
per group were, respectively, faster than and 
equivalent to the self-paced condition in terms 
of total transmission time for 60 items. 

2. All experimental conditions were inferior 
to the self-paced condition in terms of total 
errors. 

3. A higher rate of transmission of infor- 
mation was obtained for groups of two pre- 
sented at the fastest rate than for the self- 
paced condition. 

4. The delay period before the emission of 
the first response to a group of signals and 
the intervals between responses increased as 
a function of the number of items per group. 

5. These same measures were relatively un- 
influenced by variations in the rate of signal 
presentation. 

It was concluded (a) that reduction coding, 
or coding by groups, can, under limited con- 
ditions, result in performance which is su- 
perior to the self-pacing, item-by-item mode 
of operation, and (4) that the amount rather 
than the rate at which information is pre- 
sented is more critical in determining the 
temporal organization of the responses to 
grouped signals. These results were discussed 
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in relation to conditions of the experiment 
which may influence their generality. 


Received October 29, 1956. 
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A Comparative Study of Aptitude Patterns in Unskilled and 
Skilled Psychomotor Performances ' 


Edwin A. Fleishman ” 
Yale University 


The problem of predicting advanced levels 
of proficiency in skilled performance has re- 
ceived relatively little attention. For exam- 
ple, testing programs are most often evalu- 
ated against more immediate criteria of 
proficiency in lieu of intermediate or more 
ultimate criteria of performance. Yet, it is 
quite possible that the pattern of aptitudes 
contributing to individual differences in per- 
formance early in training may be quite dif- 
ferent from the aptitudes contributing to final 
performance levels. Recent evidence, e.g. 
(15, 16), suggests that this is true in the 
case of certain industrial and military tasks. 
There is a need for fundamental information 
on the kinds of aptitude variables which are 
most predictive of final asymptotes in per- 
formance after prolonged training. 

As one approach to this problem, we have 
sought to identify the kinds of aptitudes con- 
tributing to performance in a wide range of 
different psychomotor tasks as practice on 
these tasks continues (3, 6, 8, 10). Psycho- 
motor tasks have several advantages. Typi- 
cally rapid improvement in performance oc- 
curs with even brief amounts of practice. In 
addition they are inherently interesting and, 
although they may be studied in the labora- 
tory, they have a great deal of ‘face validity.” 
Moreover, the criteria of proficiency on such 
tasks can be made unequivocal and a high de- 
gree of control over the training schedule and 
procedure can be exercised. The methodology 


1 This study was carried out while the writer was 
with the Air Force Personnel and Training Research 
Center. The work was done under ARDC Project 
No. 7707, Task No. 37225, in support of the research 
and development program of the Air Force Personnel 
and Training Research Center, Lackland Air Force 
Base, Texas. Permission is granted for reproduc- 
tion, translation, publication, use and disposal in 
whole and in part by or for the United States Gov- 
ernment. 

2 The writer is indebted to Mr. Walter E. Hempel, 
Jr., for his valuable assistance in the conduct of this 
study. The data were collected as part of a larger 
study carried out by Dr. Jack A. Adams. 


in these studies involves a combination of fac- 
tor analysis design with experimental labora- 
tory methods. 

In general, our studies indicate that the par- 
ticular combinations of aptitudes contributing 
to performance on such tasks may change as 
practice continues and proficiency” increases. 
It has also been shown that these changes are 
progressive and systematic through the prac- 
tice period until a point later in practice 
where they become stabilized. In other words, 
the particular combination of aptitudes con- 
tributing to individual differences later in 
training on such tasks may be quite different 
from those contributing early in training. It 
then becomes important to establish what 
abilities are being sampled at different stages 
in performance on particular psychomotor 
tasks. Such knowledge would have implica- 
tions for future test development in this apti- 
tude area as well as for questions concerning 
the processes involved in the learning of com- 
plex perceptual-motor skills (see, e.g., [10]). 
In addition, these studies should help estab- 
lish the kinds of abilities and measures which 
best predict higher levels of proficiency in 
such skills. 

The typical design of our previous studies 
has consisted in giving samples of 200-300 
subjects extended practice on a complex cri- 
terion laboratory task. In addition, the same 
subjects receive a carefully selected, compre- 
hensive battery of printed and apparatus ref- 
erence tests. Correlations among scores taken 
from different segments of the practice period 
on the criterion task and from the reference 
tests are then subjected to factor analysis 
study. The loadings of the various stages of 
practice of the criterion task on the factors 
defined by the reference tests specify the 
changes in the factor pattern of this task as 
practice continues. In addition to the find- 
ings that such changes do occur, each of our 
studies has resulted in a factor common only 
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Fic. 1. The complex coordination task 


to the stages of practice on the criterion task 
itself and not defined by any of the reference 
tests; moreover, loadings on this factor in- 
crease progressively through the stages of 
practice until later in training when loadings 
from trial to trial remain the same. This has 
been found in the case of a serial-reaction 
task (8), a visual-discrimination reaction time 
task (10), and a continuous pursuit task (6). 
Thus, it is possible that a large portion of the 
variance at advanced proficiency levels on 
such tasks is specific to habits and skills ac- 
quired on the task itself and not defined by 
any other test variables (see, e.g., Reynolds 
|14|). This view, however, is quite pessimis- 
tic from the point of view of aptitude testing. 
Before adopting it we have attempted to ex- 
plore other classes of variables, not previ- 
ously included in our reference batteries, in 
an attempt to reduce this specific variance or 
to better understand its nature. 

Evidence against any simple specificity hy- 
pothesis has been presented in a recent corre- 
lational study (1). It was found that a com- 
bination of measures, including those taken 
from advanced practice scores on certain psy- 
chomotor tests, provided a better prediction 
of advanced proficiency on a criterion psycho- 
motor task than did a measure taken from 
early practice on the criterion task itself. 
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Moreover, advanced measures taken from 
several tasks may correlate more highly with 
each other than do the early stage measures 
on these same tasks. This raises a further 
question of whether the “within-task” factors 
found at advanced proficiency levels in the 
separate analyses have anything in common 
with each other; in other terms, are these 
“within-task” factors really confined to each 
individual task or is there something in com- 
mon between these different tasks which is 
found only at advanced levels of proficiency 
on these tasks.* 

The present study attempted to get at least 
a partial answer to this problem. 


Procedure 


The subjects were 200 basic trainee airmen. They 
received extended practice on seven different psycho- 
motor tasks. Considerably longer practice was given 
on one of these tasks, the Complex Coordination 
Test (see Fig. 1), which was considered the criterion 
task in the present study. Practice on this task was 
distributed over a two-day period. Extended prac- 
tice on the remaining six tasks was distributed within 
single continuous sessions. In addition, all subjects 
received a battery of printed and psychomotor refer- 
ence tests. The reference tests, the criterion task, 
and the other experimental practice tasks are de- 
scribed briefly in turn. More detailed references on 
these tests are indicated where possible 


Reference Tests 


Instrument Comprehension (12). For each item 
which presents views of cockpit instruments, the 
examinee must determine the proper position and 
orientation of an airplane 

Reaction Time (1, 5). Strike a button as quickly 
as possible by an arm-hand movement, in response to 
a light when it appears. Score is cumulated reaction 
time for a series of 10 reactions. 

Rate of Movement (1). Break the beams be 
tween a series of photoelectric cells, one after an- 
other, by making scalloped arm-hand movements as 
rapidly as possible. Score is the number of 
the beam is broken in a two-minute tria! 

Pattern Comprehension (12). A series of drawings 
is presented which require visualization of relation- 
ships between components of solids and their un- 
folded flat projections. 

Mechanical Principles (12). Pictorial items re- 
quire the comprehension of principles and mecha- 
nisms, such as leverage or rotation and transforma- 


times 


% The implications of these studies for a general 


theory of learning and human ability has very re- 
cently been presented by George Ferguson, in his 
presidential address to the Canadian Psychological 
Association, 1956 (2). 
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tion of motion, involved in the action and uses of 
various mechanical devices. 

General Mechanics (12). Printed items require 
practical mechanical information dealing with the 
use or operation of familiar mechanical methods and 
devices. 

Speed of Identification (12). Pictorial items are 
presented in which the silhouette of an object must 
be identified when it is rotated and imbedded in a 
group of similar silhouettes. 

Visual Pursuit (1,12). From a series of mazes or 
irregularly curved lines, the task is to trace each line 
visually from its beginning $ proper termination 
point, 


The Criterion Practice Task 


Complex Coordination (8, 13). Figure 1. The 
subject is required to make complex motor adjust- 
ments of stick and pedal controls in response to suc- 
cessively presented patterns of visual cues. Stimu- 
lus and response lights in three dimensions must all 
be aligned properly before a new setting is present 
Subjects received 64 two-minute trials distributed 
over two days, with a morning and afternoon session 
each day. Scores recorded were the number of com- 
pleted settings during each trial. 


The Experimental Practice Tasks 


Rotary Pursuit (13). Figure 2. The subject at- 
tempts to keep a prod-stylus in contact with a small 
metallic target (.75 in. diameter) set in a rapidly re- 
volving disc (10.9 in. diameter). Score is total time 
on target during each of thirty 30-sec. trials sepa- 
rated by 1-min. rests 

Plane Control (13). Figure 3. The attitude of a 
model airplane (mounted in front of the subject) is 
varied irregularly in its roll, pitch, and yaw axes by 
a motor driven cam system. The subject attempts 
to keep the airplane in a straight and level attitude 
by making compensatory adjustments of stick and 
pedal controls. Score is the amount of time the 
plane was kept straight and level during each of 
thirty 1-min. test periods separated by 30-sec. rests 

Kinesthetic Coordination (1). Figure 4. The sub- 
ject attempts to coordinate his pressure against a 
stick and pedal control simultaneously in order to 
match the considerable tension exerted against these 
controls through a hydraulic system. When he suc- 
ceeds in countermatching the control forces, a new 
“setting” of control tensions must then be counter- 
matched. Score was the number of settings matched 
in each of ten 1-min. trials separated by 1-min. rests. 

Unidimensional Matching (1). Figure 5. The sub- 
ject is confronted with a panel and two parallel hori- 
zontal rows of lights, one red and one green. He at- 
tempts to match a stimulus red light with a green 
light through appropriate movements of a hand-op- 
erated control handle. A new “light setting” is pre- 
sented after each successful match held for .5 sec 
Score is the number of matchings for each of fifteen 
30-sec. trials separated by 30-sec, rests 
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Two-Hand Matching (1). Figure 6. The appa- 
ratus is similar to Unidimensional Matching except 
the red row of stimulus lights has a parallel row of 
green lights above and below it. The subject must 
match lights simultaneously in each row of green 
lights with the red lights through movements of two 
handles. The movements of each hand 
correspond and sometimes are antagonistic to each 
other and bear a complex relationship to the posi 
tion of the stimulus lights from setting to setting 
Score is the number of simultaneous matchings made 
and held for .5 sec. during each of thirty 1-min. trials 
separated by 30-sec. rests. 

Discrimination Reaction Time (10, 12). Figure 7 
The examinee manipulates one of four toggle switches 
as quickly as possible in response to a series of visual 
stimulus patterns differing from one another with re 
spect to the spatial arrangement ‘of their component 
parts (e.g., position of a lighted red lamp relative to 
a lighted green lamp). Score is the cumulated reac 
tion time for each of a series of reaction trials, where 
a trial is ten successive reactions. Two minutes rest 
was given after every 80 reactions. 


sometimes 


Data Analysis Procedure 


Two independent comparison factor analyses were 
carried out and the results compared. One analysis 
was based on the intercorrelations among scores on 
the eight reference tests (Variables 1-8), scores taken 
from four separate segments of practice on the cri 
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Table 1 
Scores* Included in Each Analysis for the Six 
Experimental Psychomotor Tasks 
Task 


Analysis 1 Analysis 2 


Trials 1-8 
Trials 1-10 
Trials 1-5 
Trials 1-5 
Trials 1-5 


Rotary Pursuit | Trials 25-30 
Trials 21-30 
Trials 6-10 
Trials 11-15 
Trials 26-30 


Plane Control 

Kinesthetic Coordination 

Unidimensional Matching 

‘Two-Hand Matching 

Discrimination Reaction 
Time 


Trials 1-5 Trials 20 24 


* Scores included are cumulated totals across the trials indi- 
cated. 


terion task (Variables 9-12, Table 2), and scores 
taken during the earliest stages of practice on each 
of the remaining six experimental psychomotor tasks 
(Variables 13-18). The second analysis was identi 
cal to this in that exactly the same reference test 
scores and scores from the criterion task were used. 
The only difference between the two analyses was 
that in Analysis 2, scores taken during the last trials 
of practice were substituted for the early stage scores 
used in Analysis 1, for just the six experimental psy- 
chomotor tasks (Variables 13-18). Table 1 shows 
the scores included in each analysis for the six ex- 
perimental tasks. In each analysis, factors were ex- 
tracted by Thurstone’s centroid method (17) and 
orthogonal rotations were carried out using Zimmer- 
man’s graphical procedure (18). 


Results 


Table 2 presents the orthogonal solutions 
of rotated factor loadings obtained in Analy- 
sis 1. Table 3 presents the orthogonal solu- 
tion of rotated factor loadings for Analysis 2, 
in which the scores taken late in practice were 
substituted for Variables 13-18. 

The results indicate that the eight factors 
identified were the same in both analyses with 
very little qualification. To clarify these 
complex results, the factors will be described 
in order. Variables with loadings over .30 on 
one or both analyses will be listed for com- 
parison purposes. The factors will be defined 

‘ The intercorrelations among the scores in Analy- 
sis 1 and among the scores in Analysis 2, as well as 
the two centroid factor matrices, have been deposited 
with the American Documentation Institute. Order 
Document No. 5283 from the ADI Auxiliary Publi- 
cations Project, Photoduplication Service, Library of 
Congress, Washington, D. C., remitting in advance 
$1.25 for microfilm or $1.25 for photocopies. Make 


checks payable to Chief, Photoduplication Service, 
Library of Congress 
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by the reference tests; next we will examine 
changes in the loadings of the practice stages 
on the criterion task on each factor. Finally, 
we will see if substituting a late stage score 
(Analysis 2) for an early stage score (Analy- 
sis 1) changed the factor loadings for the ex- 
perimental psychomotor tasks. To observe 
changes with practice on the criterion task, 
read down the columns of loadings; to ob- 
serve changes in the experimental tasks, read 
across from Analysis 1 to Analysis 2. 

Factor I is defined by the Rate of Move- 
ment and Reaction Time tests, both of which 
require a series of rapid, gross arm move- 
ments. These tests have identified a factor 
called simply Speed of Arm Movement in 
previous analyses (9, 15, 17). 

No. 


Variables Loading 


Reference Tests Analysis 1 Analysis 2 


OD 65 
AS A2 


Reaction Time 
Rate of Movement 


Criterion Task 

Trials 1-5 

Trials 12-16 

Trials 49-53 A! 38 
Trials 60-64 AS aa 


Experimental Tasks 


Al 
A9 


Rotary Pursuit 
Plane Control 
Discrimination Reaction 


Time 28 46 


The close correspondence in loadings for 
these tests achieved in the two analyses is 


readily observed. In each analysis it may 
also be seen that loadings of the criterion 
task (Variables 9-12) increase from insignifi- 
cant values to substantial loadings through 
the practice period. With respect to the ex- 
perimental tasks, we find Rotary Pursuit, 
Plane Control, and Discrimination Reaction 
Time show loadings below .30 on this factor 
.in Analysis 1. However, where scores taken 
late in practice on these tasks are substituted 
in Analysis 2, loadings increase substantially. 
The finding that Speed of Arm Movement con- 
tributes important variance late in proficiency 
on more complex tasks is consistent with 
previous findings (6, 8, 10). 
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Table 2 


Rotated Factor Loadings* for Analysis 1 


Factors** 


Variable 


. Instrument Comprehension 
. Reaction Time 
. Rate of Movement 
. Pattern Comprehension 
Mechanical Principles 
. General Mechanics 
Speed of Identification 
Visual Pursuit 
. Complex Coord. Trials 1-5 5 3 Z ; ’ 42 
Trials 12-16 46 
Trials 49-53 47 
Trials 60 64 40 
55 
31 
Q9 
45 
7008 
10) O4 


Complex Coord 


ed 


e Complex Coord 
. Complex Coord 


7 


. Rotary Pursuit 
Plane Control 

. Kinesthetic Coordination 
Unidimensional Matching 


be wo 


. Two-Hand Matching 


yee KH — db dw bw 


Discrimination Reaction Time 


* Decimals omitted 

** Factors are identified as follows: |—Speed of 
Iixperience; V-——-Spatial Orientation; VI-— Response 
“Within factor; IX Residual bactor 


Arm Movemen isualization; III 


‘erceptual Speed; 1\ Me ‘ | 
Orientation; mtrol Sensitivit Vill 


Complex Coordination 
Task 


Table 3 


Rotated Factor Loadings* for Analysis 2 


tors** 
Variable / V 


Instrument Comprehension 7 2 49 
Reaction Time 
Rate of Movement 
Pattern Comprehension 
. Mechanical Principles 12 
. General Mechanics 5 13 
. Speed of Identification 
. Visual Pursuit 27 j 24 


. Complex Coord. Trials 1-5 2 32 


Complex Coord 
. Complex Coord. 


Complex Coord 


Trials 12-16 
Trials 49-53 
Trials 60-64 


38 
37 


44 


V4 


. Rotary Pursuit 47 28 

. Plane Control 19 18 

. Kinesthetic Coordination 10) 
Unidimensional Matching 24 34 61 

21 22 28 Ol 


Discrimination Reaction Time 46 16 


Two-Hand Matching 62 


28 ? 57 


* Decimals omitted 

** Factors are identified as follows: I-——Speed of Arm Movement; I1-—V 
Experience; V—Spatial Orientation; VI-—Kesponse Orientation; VII 
“Within-Task"’ factor; IX—Residual Factor. 


sualization; II] ——Perceptua 
Fine Control Sensitivity ; 


speed; I Mechanical 
Complex Coordination 
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Factor 11 is defined chiefly by Pattern Com- 
prehension and Mechanical Principles in both 
analyses. ‘These tests have served as refer- 
ence tests of Visualization (7, 9, 11, 12). 
Tests of this factor require mental manipula- 
tion of visual objects, in which it is usually 
necessary to move, turn, twist, or invert one 
or more parts of a configuration in a certain 
specified sequence, The examinee is required 
to recognize the new position, location, or 
changed appearance after the imagined modifi- 
cation, 


No. Variables Loading 


Analysis 2 


Reference Tests Analysis 1 
Pattern Comprehension 06 0 
Mechanical Principles 53 AO 


Speed of Identification AA 38 


Criterion Task 
9 Trials 1-5 35 
10 Trials 12-16 16 
11 Trials 49 53 AS 
12 Trials 60-64 aa 


It can be seen that this factor contributes 
to performance in the criterion task at the 
earliest stage of learning, but decreases in im- 
portance through practice. None of the ex- 
perimental tasks showed loadings on_ this 
factor. 

Factor 111 is defined by reference tests of 
the Perceptual Speed factor (9, 11, 12). This 
factor represents the ability to make rapid 
comparisons of visual forms and the notation 
of similarities or differences in detail. 


Variables Loading 


Reference Tests {nualysis 1 Analysis 2 
Speed of Identification 44 4A6 
Visual Pursuit 38 AO 


Pattern Comprehension 18 36 


Experimental Tasks (I) (L) 
Unidimensional Matching —.34 09 


None of the stages of practice on the cri- 
terion practice task are loaded on this fac- 
tor. The only experimental task involving 
this factor is the Unidimensional Matching 
task, and it is to be noted this task samples 
this factor only in the early stage of practice. 


Edwin A. Fleishman 


Factor IV is confined to the two tests, Me- 
chanical Principles and General Mechanics, 
which are reference tests of the Mechanical 
ix perience factor (11, 12). 


No. Variables Loading 


Analysis 2 


Reference Tests Analysis 1 
Mechanical Principles 


Genera! Mechanics : 4 


None of the experimental tasks has a load- 
ing as high as .30 on this factor. 

Factor V is identified as Spatial Orientation, 
from the loadings of Instrument Comprehen- 
sion and Pattern Comprehension. Moreover, 
the standard administration of the Complex 
Coordination test (approximated by Variable 
9 in the present study) and the Discrimina- 
tion Reaction Time test (approximated by 
Variable 10, Analysis 1) have served as ref- 
erence tests of Spatial Orientation in previous 
studies. 


No. Variables Loading 


Reference Tests Analysis 1 Analysis 2 
Instrument Compre 
hension i A9 


Pattern Comprehension : 31 


Criterion Task 

Trials 1-5 4 36 
Trials 12-16 32 
Trials 49-53 

Trials 60-64 : 13 
Experimental Tasks o (L) 


Discrimination Reaction 
Time 46 38 


The factor has been defined as representing 
the ability to comprehend the spatial arrange- 
ment of a visual stimulus pattern, primarily 
with respect to the examinee’s body as the 
frame of reference (7, 12). The present 
study confirms previous indications (8) that 
the importance of this factor in the criterion 
task decreases after early proficiency on the 
task. The results also confirm findings from 
an independent study (10) that this factor 
contributes to Discrimination Reaction Time 
performance late as well as early in practice 
but that there is a decrease in its importance 
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later in practice. The decrease with practice 
is not as marked in the present study, how- 
ever.® 

Factor VI is confined only to the psycho- 
motor tasks. This appears the same as the 
factor named Response Orientation in our 
previous analyses (7,9). It has been defined 
as the ability to make rapid discrimination as 
to directions of movement or choice of move- 
ment when one is actually making the move- 
ment. The factor does not involve the inter- 
pretation or the spatial characteristics of the 
stimulus (as in Spatial Orientation), but 
rather making the correct movement in rela- 
tion to the correct stimulus. In other words, 
“Given this stimulus, which way should I 
move?” 


Variables Loading 


Criterion Task Analysis 1 


Analysis 2 
Trials 1-5 38 ae 
Trials 12-16 J 44 
Trials 49-53 38 A8 
Trials 60-64 a 44 


Experimental Tasks } (L) 
Plane Control 3. 27 
Kinesthetic Coordination 35 AO 
Unidimensional Matching .19 
Two-Hand Matching 70 ol 
Discrimination Reaction 

Time 40 28 


As can be seen, this factor is found in both 
analyses as common to all stages of practice 
on the criterion task. With respect to the ex- 
perimental tasks, it is shown to be best meas- 
ured by the Two-Hand Matching task. This 
is consistent with previous interpretations of 
this factor, since this task involves a very 
complex stimulus-response relationship (see 
above task description). It is also seen that 
this task continues to show a high loading on 
this factor in Analysis 2 where the last stage 
of practice was substituted for the early stage 


5 In this previous study of the Discrimination Re- 
action Time task, a drop in loading from .60 to 33 
was achieved on a Spatial factor. Subsequent stud- 
ies (7, 9) have split this factor into Spatial Orienta 
tion (Factor V) and Response Orientation (Factor 
Il). The present results indicate that the decrease 
in spatial loadings found previously in this Discrimi- 
nation Reaction Time task is mainly a function of a 
decrease in Response Orientation (see Factor II). 


269 


score. It is also observed that the Unidimen- 
sional Matching task involves this factor late 
in practice but not early. Plane Control and 
Kinesthetic Coordination measure this factor 
at a fairly stable level through practice, but 
Discrimination Reaction Time shows a marked 
decrease in loading with practice. Thus, we 
see an example of a factor decreasing as a 
function of practice in some tasks, increasing 
in others, and remaining stable in certain 
other tasks. 

Factor VII is the factor labeled Fine Con- 
trol Sensitivity in a previous analysis (9). 
During the wartime AAF studies (4, 12, 13) 
this factor was called Psychomotor Coordina- 
tion, but recent work has shown the essential 
feature of this factor to be the ability to make 
highly controlled, but not overcontrolled, 
sensitive movements of moderate scope. We 
have called this factor Psychomotor Coordi- 
nation-I in a previous study (9), but the pres- 
ent label is less ambiguous. 


No Variables Loading 


Criterion Task Inalysis 1 Analysis 2 
Trials 1-5 AO 
Trials 12-16 44 
Trials 49-53 J 45 


Trials 60-64 f 45 


Experimental Tasks 
13 Rotary Pursuit 
14 Plane Control > 
16 Unidimensional Matching 19 


The chief definers of this factor have been 
Rotary Pursuit and Complex Coordination 
(represented by early stage practice on these 
tests in the present study). 

Our results show this factor to be impor- 
tant at all stages of practice in the criterion 
task. In Rotary Pursuit and Unidimensional 
Matching, the importance of this factor drops 
sharply in Analysis 2 where late stage scores 
were substituted. Plane Control maintains a 
stable loading on this factor in both stages of 
practice. 

Factor VIII is not defined by any of the 
reference tests or any of the experimental 
tasks. Loadings on this factor are confined 
to the four stages of practice on the criterion 
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task. This is labeled simply the Within-Task 
factor, 


No. Variables Loading 


Criterion Task Analysis 1 Analysis 2 


9 Trials 1-5 oe 24 
10 Trials 12-16 Al 39 
11 Trials 49-53 52 AT 
12 Trials ()-64 58 54 


Further, it can be seen that loadings of this 
task on this factor increase through the stages 
of practice. The important thing to note is 
that this is true in both analyses. In other 
words, inclusion of the late stage scores from 
other tasks in Analysis 2 did not result in the 
identification of this factor as common to ad- 
vanced proficiency levels on a number of dif- 
ferent tasks. 

Factor 1X is a 
analyses. 


residual factor in both 


Discussion 


It has been shown that although the same 
eight factors were identified in each analysis, 
the pattern of these factors for the six ex- 
perimental psychomotor tasks was different 
when the late stage scores were substituted for 
the early stage scores for these same tasks, as 
was done in Analysis 2. For example, the 
Discrimination Reaction Time task measures 
Spatial Orientation and Response Orientation 
early, but later in practice individual differ- 
ences in performance on the task are more a 
function of Speed of Arm Movement. The 
Complex Coordination task, which is the only 
task for which several stages of practice were 
included within each analysis, shows a pro- 
gressive drop within each analysis in Spatial 
Orientation and Visualization, a progressive 
increase in Speed of Arm Movement and rela- 
tively stable high loadings in Fine Control 
Sensitivity and Response Orientation. A fairly 
stable factor pattern was achieved, however, 
in the case of the Two-Hand Matching task, 
which is mostly a matter of Response Ori- 
entation in each analysis. 

The cross-sectional comparison of the two 
analyses confirms and extends the findings of 
our previous extended practice studies (3, 6, 
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8, 10). Moreover, the inclusion of several 
measures of advanced proficiency in the same 
analysis did not result in the identification of 
new factors; that is, no new factors were 
found which were confined only to advanced 
measures. The question was one of degree 
of involvement of particular factors already 
identified. The correlations among advanced 
measures on different tasks were accounted for 
entirely by familiar common factors found in 
other analyses. For example, the correlation 
between the Complex Coordination and Plane 
Control tasks is higher when scores later in 
practice are used. The reason is that each 
task measures a Speed of Arm Movement 
factor late in training but not in the early 
stages of proficiency. The Fine Control Sen- 
sitivity and Response Orientation factors are 
common to both these tasks at early as well 
as late stages of proficiency. 

It should also be noted that there was no 
decrease in the communalities for these ex- 
perimental tasks when the late stage scores 
were substituted for early proficiency scores. 
In fact, for Plane Control and Unidimensional 
Matching the communalities were consider- 
ably higher in the late stage analysis. Thus, 
for Plane Control, the percentage of common 
Variance increased from 41% to 58% and for 
Unidimensional Matching the increase was 
from 45% to 66%. These results indicate 
that there was no over-all decrease in pre- 
dictability of performance from independent 
measures as practice continues. It is merely 
the particular combination of common factors 
that changes. Moreover, it can be seen that 
the kinds of aptitudes contributing variance 
in advanced levels in the psychomotor task 
are defined by other psychomotor measures, 
even though aptitudes measured by printed 
tests contribute much of the variance at early 
proficiency levels. 

The inclusion of several practice stages for 
the Complex Coordination task within each 
analysis allowed the appearance of the factor 
specific to practice on this task. This was 
not possible with the other tasks as only one 
score was included in each analysis. It can 


be seen that this “within-task” factor was 
just as pronounced in Analysis 2, as in Analy- 
sis 1, and that the importance of this factor 
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increased with practice in both analyses. Ap- 
parently, the inclusion of late stage scores 
from other tasks in Analysis 2 did not help 
break down this “within-task” variance or 
help define it. 

It is possible that our analyses thus far 
have not included the appropriate reference 
variables for this purpose. For example, one 
hypothesis is that kinesthetic abilities play an 
increasingly important role at higher levels of 
proficiency (see, e.g., {16]). Another hy- 
pothesis is that individual differences at ad- 
vanced levels of proficiency are somewhat a 
function of skill in integrating well learned 
component skills, while earlier in training 
these component abilities are factors con- 
tributing to individual differences. In one of 
our most recent studies, not yet complete, we 
are testing this hypothesis by including a wide 
range of independent measures of so-called 
integration abilities along with our practiced 
tasks. There is evidence from previous fac- 
tor analyses (9, 11, 12) that such integration 


abilities exist, but measures of these have | 


never been investigated in relation to prob- 
lems of learning and training. Our hypothe- 
sis is that loadings of certain tasks on an in- 
tegration factor (or factors) may increase as 
a function of practice and that this factor may 
account for at least some of the variance now 
included in our “within-task” factor. 


Summary 


A cross-sectional and longitudinal compari- 
son was made of abilities involved at early 
and late stages of proficiency on a variety of 
complex psychomotor tasks. The methodology 
involved giving extended practice on seven 
different tasks to the same subjects who also 


received a battery of reference tests. Factor 
analysis techniques were applied to the ex- 
perimental data. Factors were defined by the 
reference tests, and the resulting loadings of 
these in different stages of practice in the psy- 
chomotor tasks were examined. The results 
confirm and extend previous findings in this 
series, which indicate considerable, but sys- 
tematic, changes in the patterns of abilities 
contributing to proficiency on complex tasks 
as training continues and proficiency increases. 


Some abilities were shown to increase in im- 
portance, others were shown to decrease, and 
others remained at a fairly stable level through 
the practice period. It was found, for exam- 
ple, that the kinds of aptitudes contributing 
variance at advanced proficiency in the psy- 
chomotor tasks were defined by other psycho- 
motor measures, even though printed tests 
contributed much of the variance early in 
training. It was also shown that prediction 
of late proficiency from independent measures 
may become increasingly difficult. However, 
some tasks showed no decrease in common 
variance with other tasks as a function of 
practice. The results are related to previous 
work and the implications discussed. 

It is hoped that our studies along these 
lines may contribute important leads toward 
isolating and predicting classes of variables 
important to high levels of proficiency on 
complex psychomotor tasks. 
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